When Smaller Stops Being Faster: a Quant Ladder on the MI300X
Someone handed me an MI300X for a little while, so I walked two MoE coding models down the quant ladder — BF16 to Q4 — to answer what the Strix Halo miniPC posts couldn’t: when memory bandwidth stops being the bottleneck, how much quality does quantizing cost and is the speed worth it? Quality is flat to Q4 on both…
Read the full post →