Introducing TokenSpeed: An Open-Source LLM Inference Engine with TensorRT-Level Performance
TokenSpeed is a new LLM inference engine that matches TensorRT-LLM in performance while remaining as easy to use as vLLM, released under the MIT license.
Sources x.com