Jensen Huang Keynote
Vera Rubin ships. Groq gets absorbed. Jensen sees $1T.
March 17, 2026 · SAP Center, San Jose · ~2h keynote
Contents
Through 2027 · In 2 years (per GW) · Vera Rubin upgrade · Decode throughput
Vera Rubin Platform
Architecture
- 7 chips, 5 rack-scale computers. Not a GPU cluster, a unified AI supercomputer.
- NVLink 72 at 260 TB/s all-to-all bandwidth.
- 3.6 exaflops of compute.
- 100% liquid cooled with 45°C hot water. Kills the air cooling bottleneck.
- Install: 2 days → 2 hours. Structured cables, zero spaghetti.
The Five Racks
- NVLink 72 Rack. GPU compute with 6th-gen NVLink scale-up.
- Vera CPU Rack. LPDDR5, extreme single-thread perf for agentic tool use.
- STX Rack. Bluefield 4 AI-native storage (KV cache, KUDF, KVS).
- Groq LPX Rack. 8 LP30 chips, massive SRAM for ultra-fast decode.
- Spectrum X CPO. Co-packaged optics scale-out (in production with TSMC).
Generational Performance Leap
Groq LPU Integration
How It Works
NVIDIA swallowed Groq's team and IP. The real move: disaggregated inference via Dynamo.
- Vera Rubin handles prefill. Heavy math, context ingestion, KV cache (288 GB HBM).
- Groq handles decode. Deterministic dataflow, compiler-scheduled, massive SRAM (500 MB/chip) for ultra-low-latency token generation.
- Connected via Ethernet with a special low-latency mode (2x reduction).
- Unified by Dynamo: the operating system for AI factories.
Jensen's Recommendation
Jensen's recommended mix for most data centers:
- 75% Vera Rubin. Bulk workloads, high throughput.
- 25% Groq. Premium tier, coding, reasoning, anything where latency = money.
- If your workload is mostly batch/throughput: 100% Vera Rubin.
- Groq extends performance beyond NVLink 72's bandwidth limits for 1000+ tokens/sec.
Inference Architecture Deep Dive
Source: SemiAnalysis, “The Inference Kingdom Expands” (Mar 2026)
The $20B Groq “Acquisition”
- Structured as IP license + team hire, not a legal acquisition. Walks right past antitrust.
- 4 months from handshake → integrated system on stage. Regulators never got a shot.
- Groq LPU 2 (Samsung SF4X) never shipped, SerDes couldn't hit 112G. Skipped to LPU 3 (LP30).
- LP30 has zero Nvidia IP, it's pure Groq design. Real co-design starts with LP40 (TSMC N3P, CoWoS-R, NVLink protocol).
Supply Chain Arbitrage 🔑
- LP30 runs on Samsung SF4, NOT TSMC, NOT HBM.
- Zero competition for scarce TSMC N3 allocation or HBM supply.
- Nvidia ramps LPU production without cannibalizing a single GPU wafer.
- Pure incremental revenue from stranded Samsung capacity nobody else can touch.
- LP30: monolithic die, no advanced packaging. Simpler fab, faster ramp.
💡 The single most underpriced detail in the entire GTC cycle.
Attention-FFN Disaggregation (AFD)
- Attention = stateful (KV cache) → runs on GPUs (needs HBM capacity).
- FFN = stateless → runs on LPUs (deterministic, SRAM-fast).
- Tokens ping-pong between GPUs and LPUs over Spectrum-X Ethernet.
- MoE models get sparser → fewer tokens per expert → GPU utilization tanks → AFD fixes this by freeing all HBM for KV cache.
- Alternate mode: LPUs run speculative decoding draft models (1.5–2× throughput per step).
LP30 vs LP40 Roadmap
- LP30: Samsung SF4, 500MB SRAM, 1.2 PFLOPS FP8, Groq C2C protocol. Shipping with Vera Rubin.
- LP35: Minor refresh on SF4, adds NVFP4 format. New tapeout, same node.
- LP40: TSMC N3P + CoWoS-R. First true Nvidia co-design. NVLink replaces Groq C2C. Hybrid bonded DRAM from SK Hynix extends memory beyond SRAM.
⚠️ Bear case: LP40 is 2+ years out. Execution risk on Samsung node is real.
CPO Roadmap (Co-Packaged Optics)
- Rubin NVL72: All copper scale-up (Oberon rack).
- Rubin Ultra NVL144: All copper (Kyber rack). No CPO despite analyst rumors.
- Rubin Ultra NVL576: 8× Oberon racks, first CPO deployment between racks. Copper within. Low volume / testing.
- Feynman NVL1152: 8× Kyber racks. CPO between racks, copper within. Jensen says “all CPO” but blog disagrees, still TBD.
- Nvidia's philosophy: “Copper where we can, optics where we must.”
Token Economics
The Token Factory Pricing Spectrum
Jensen's framing: “Every CEO in the world will be studying their token factory throughput chart. This year's decisions show up precisely as next year's revenues.”
| Tier | Price/M Tokens | Characteristics | Use Case |
|---|---|---|---|
| FREE | $0 | High throughput, small models, high latency | Customer acquisition, basic queries |
| BASIC | $3 | Medium models, reasonable speed | Consumer chatbots, content generation |
| PRO | $6 – $45 | Larger models, higher speed, long context | Professional work, analysis, code assist |
| PREMIUM | $150 | Frontier models, max intelligence, fast decode | Research, critical path decisions, deep reasoning |
| ULTRA | $150+ | Ultra-fast tokens, Groq-accelerated decode | Real-time coding, long research runs |
OpenClaw. The Linux of Agents
Why It Matters
Jensen gave this more stage time than Vera Rubin. Direct quotes:
- “As big as HTML.”
- “As big as Linux.”
- “As big as Kubernetes.”
- Most popular open-source project in history, beat Linux's star count in weeks.
- “OpenClaw has open-sourced the operating system of agent computers.”
Enterprise Stack (Nemo Claw)
- Open Shell. Security and privacy guardrails for corporate agents.
- Privacy Router. Blocks sensitive data from leaking to model providers.
- Policy Guard Rails. Hooks into existing SaaS policy engines.
- Every SaaS company becomes a GaaS company (agentic-as-a-service).
- Reference design: download, customize, ship.
Neimotron Coalition
Partners building Neimotron 4, NVIDIA's frontier open model:
Architecture Roadmap
New architecture every year. Copper and optical in parallel going forward.
Scale-Up: Copper vs Optical
+Vera CPU Standalone
+Physical AI & Robotics
Robo-Taxi Partnerships
New partners (18M cars/year combined production):
Joins existing partners: Mercedes, Toyota, GM. Plus Uber for multi-city deployment.
“The ChatGPT moment of self-driving cars has arrived.”
The Three Computers of Robotics
- Training computer. Isaac Lab for RL policy training at scale.
- Simulation computer. Newton physics + Cosmos world models for synthetic data.
- Robot computer. Jetson. Runs on the robot itself.
110 robots on the GTC show floor. Every major robotics company works with NVIDIA.
More Highlights
DLSS 5 / Neuro Rendering
- Fuses 3D graphics + generative AI. Deterministic structure meets probabilistic generation.
- “One is completely predictive, the other probabilistic yet highly realistic.”
- This pattern, structured data + generative AI, is the template for every industry.
- “Structured data is the foundation of trustworthy AI.”
- The demo was the best visual at GTC. Computer graphics that breathe.
Notable Quotes
“If you have the wrong architecture, even if it's free, it's not cheap enough.”
On why token cost per watt matters more than chip cost
“Dylan Patel accused me of sandbagging. He says it's actually 50×. And he's not wrong.”
On Blackwell inference benchmarks (Semi Analysis study)
“Every engineer's recruiting package will include a token budget. Tokens are the new compensation.”
On the future of knowledge work
“Every single SaaS company will become a GaaS company. An agentic-as-a-service company.”
On the enterprise IT shift
“Computing demand has increased by 1 million times in the last two years.”
On the inference inflection (10,000x per-task times 100x usage)
“We see through 2027 at least $1 trillion.”
On the demand pipeline (up from $500B last year)