UniScale: Jointly Optimizing Model Routing and Test-Time Scaling
UniScale is an online framework that unifies model routing and test-time scaling into a single optimization space, achieving a better balance between quality and cost.
UniScale is an online framework that unifies model routing (switching between model sizes) and test-time scaling (adjusting computation during inference) into a single optimization space. This approach utilizes LinUCB to learn inference policies, achieving a better trade-off between quality and cost in dynamic scenarios.
Why It Matters
It addresses the challenge of optimizing AI infrastructure costs without abruptly compromising response quality.