Do the wins stack? MTP on top of sliding-window attention
Two decode-speed wins from this series, MTP and the sliding-window recipe, attack different costs, so they should just stack. They didn’t even get the chance - turning both on aborted at load with an assert, and it turned out I’d already shipped the fix quietly in the last post’s patch file without saying so. Here’s…
Read the full post →