I got a usable 256K-token context on my Strix box
A usable million-token context on my Strix Halo box turned out to be a trap - prefill takes 4.6 hours and the tricks that make it fast make it useless. What I got instead is a genuinely usable 256K context: ten and a half minutes to ingest, 36 tokens/s, retrieval intact. Getting there took four wrong turns, a…
Read the full post →