Llama.cpp Supports MTP: Boosting Local AI Speed by 78% 🚀
The latest llama.cpp update supporting Multi-Token Prediction (MTP) enables the Qwen3.6-27B model to reach 45 tokens/second on mid-range hardware, accelerating the trend of self-hosting AI.
Sources x.com