tools-ai Tech Jun 6, 2026 1 min read

NVIDIA launches Vera Rubin platform — processing trillion-parameter models at 400 tokens per second

NVIDIA's new Vera Rubin platform, combining NVL72 and Groq 3 LPX, enables running agentic workloads on massive MoE models without sacrificing latency.

Tier 1 · sources 99% confidence Reviewed

Nvidia GPU Infrastructure MOE LLM

Sources x.com

NVIDIA continues to solidify its leadership in AI infrastructure with the introduction of the Vera Rubin platform, targeting trillion-parameter scale language models.

The Details

The system is a combination of the Vera Rubin NVL72 architecture and NVIDIA Groq 3 LPX technology. Its core objective is to power agentic workloads (autonomous agent tasks) on massive Mixture of Experts (MoE) models. Vera Rubin is capable of delivering speeds of up to 400 tokens per second per user without increasing latency, solving the trade-off between throughput and responsiveness.

Why It Matters

Real-time processing capabilities for massive models are key to deploying AI agents widely in practice. For tech enterprises in Vietnam building cloud AI infrastructure, Vera Rubin represents a new benchmark in performance, helping realize instant-response AI applications despite the immense complexity of the underlying models.