Jensen Huang Keynote
The Inference Inflection, Vera Rubin, and the Agent Revolution
March 17, 2026 · SAP Center, San Jose · ~2h keynote
Contents
Through 2027 · In 2 years (per GW) · Vera Rubin upgrade · Decode throughput
Vera Rubin Platform
Architecture
- 7 chips, 5 rack-scale computers. one unified AI supercomputer
- NVLink 72 @ 260 TB/s all-to-all bandwidth
- 3.6 exaflops of compute
- 100% liquid cooled with 45°C hot water. takes pressure off data center cooling
- Install time: 2 days → 2 hours (structured cables, no spaghetti)
The Five Racks
- NVLink 72 Rack. GPU compute with 6th-gen NVLink scale-up
- Vera CPU Rack. LPDDR5, extreme single-thread perf for agentic tool use
- STX Rack. Bluefield 4 AI-native storage (KV cache, KUDF, KVS)
- Groq LPX Rack. 8 LP30 chips, massive SRAM for ultra-fast decode
- Spectrum X CPO. co-packaged optics scale-out (in production with TSMC)
Generational Performance Leap
Groq LPU Integration
How It Works
NVIDIA acquired the Groq team and licensed the technology. The key insight: disaggregated inference via Dynamo.
- Vera Rubin handles prefill. massive math, context processing, KV cache (288 GB HBM)
- Groq handles decode. deterministic dataflow, compiler-scheduled, massive SRAM (500 MB/chip) for ultra-low-latency token generation
- Connected via Ethernet with a special low-latency mode (2× reduction)
- Unified by Dynamo. the operating system for AI factories
Jensen's Recommendation
For most data centers:
- 75% Vera Rubin. handles the vast majority of workloads (high throughput)
- 25% Groq. for premium tier, coding, high-value token generation
- If your workload is mostly batch/throughput → 100% Vera Rubin
- Groq extends performance beyond NVLink 72's bandwidth limits for 1000+ tokens/sec
Token Economics
The Token Factory Pricing Spectrum
Jensen's key thesis: “Every CEO in the world will be studying their token factory throughput chart. This year's decisions show up precisely as next year's revenues.”
| Tier | Price/M Tokens | Characteristics | Use Case |
|---|---|---|---|
| FREE | $0 | High throughput, small models, high latency | Customer acquisition, basic queries |
| BASIC | $3 | Medium models, reasonable speed | Consumer chatbots, content generation |
| PRO | $6 – $45 | Larger models, higher speed, long context | Professional work, analysis, code assist |
| PREMIUM | $150 | Frontier models, max intelligence, fast decode | Research, critical path decisions, deep reasoning |
| ULTRA | $150+ | Ultra-fast tokens, Groq-accelerated decode | Real-time coding, long research runs |
OpenClaw. The Linux of Agents
Why It Matters
Jensen devoted a significant portion of the keynote to OpenClaw, calling it:
- “As big as HTML”. started the internet
- “As big as Linux”. powered cloud computing
- “As big as Kubernetes”. enabled mobile cloud
- Most popular open-source project in history. exceeded Linux in weeks
- “OpenClaw has open-sourced the operating system of agent computers”
Enterprise Stack (Nemo Claw)
- Open Shell. security/privacy guardrails for corporate agents
- Privacy Router. prevents sensitive data exfiltration
- Policy Guard Rails. connects to existing SaaS policy engines
- Every SaaS company → GaaS company (agentic-as-a-service)
- Reference design downloadable and optimizable
Neimotron Coalition
Partnering to build Neimotron 4. NVIDIA's frontier open model:
Architecture Roadmap
Brand new architecture every single year. Both copper and optical scale-up going forward.
Scale-Up: Copper vs Optical
+Vera CPU Standalone
+Physical AI & Robotics
Robo-Taxi Partnerships
New partners announced (18M cars/year combined):
Joining existing partners: Mercedes, Toyota, GM. Plus Uber for multi-city deployment.
“The ChatGPT moment of self-driving cars has arrived.”
The Three Computers of Robotics
- Training computer. Isaac Lab for RL policy training at scale
- Simulation computer. Newton physics + Cosmos world models for synthetic data
- Robot computer. Jetson, runs on the robot itself
110 robots on the GTC show floor. Every major robotics company is working with NVIDIA.
More Highlights
DLSS 5 / Neuro Rendering
- Fusion of 3D graphics + generative AI. controllable structured data meets probabilistic generation
- “One is completely predictive, the other probabilistic yet highly realistic”
- The pattern of structured data + generative AI will repeat in every industry
- “Structured data is the foundation of trustworthy AI”
- Computer graphics literally comes to life. Jensen showed a jaw-dropping demo
Notable Quotes
“If you have the wrong architecture, even if it's free, it's not cheap enough.”
— On why token cost per watt matters more than chip cost
“Dylan Patel accused me of sandbagging. He says it's actually 50×. And he's not wrong.”
— On Blackwell inference benchmarks (Semi Analysis study)
“Every engineer's recruiting package will include a token budget. Tokens are the new compensation.”
— On the future of knowledge work
“Every single SaaS company will become a GaaS company. an agentic-as-a-service company.”
— On the enterprise IT transformation
“Computing demand has increased by 1 million times in the last two years.”
— On the inference inflection (10,000× per-task × 100× usage)
“We see through 2027 at least $1 trillion.”
— On the demand pipeline (up from $500B last year)