◼ NVIDIA GTC 2026

Jensen Huang Keynote

Vera Rubin ships. Groq gets absorbed. Jensen sees $1T.

March 17, 2026 · SAP Center, San Jose · ~2h keynote

TL;DR
Key Numbers
Vera Rubin Platform
Groq LPU Integration
Inference Architecture Deep Dive
Token Economics
OpenClaw. The Linux of Agents
Architecture Roadmap
Physical AI & Robotics
More Highlights
Notable Quotes
Watch the Full Keynote

Demand Pipeline

$1T+

Token Throughput

350×

Rev/GW vs Blackwell

5×

Groq Premium Tier

35×

Through 2027 · In 2 years (per GW) · Vera Rubin upgrade · Decode throughput

TL;DR: Jensen's most product-dense GTC ever. He unveiled the Vera Rubin platform (successor to Blackwell: 7 chips, 5 rack-scale computers, 3.6 exaflops). He announced Groq LPU integration for 35x token decode throughput. He declared $1 trillion+ in visible demand through 2027. He crowned OpenClaw as the “Linux of agents.” Every company needs an OpenClaw strategy. Also: a Disney Olaf robot walked on stage and roasted Jensen's height. Peak GTC.

Vera Rubin Platform

Architecture

7 chips, 5 rack-scale computers. Not a GPU cluster, a unified AI supercomputer.
NVLink 72 at 260 TB/s all-to-all bandwidth.
3.6 exaflops of compute.
100% liquid cooled with 45°C hot water. Kills the air cooling bottleneck.
Install: 2 days → 2 hours. Structured cables, zero spaghetti.

The Five Racks

NVLink 72 Rack. GPU compute with 6th-gen NVLink scale-up.
Vera CPU Rack. LPDDR5, extreme single-thread perf for agentic tool use.
STX Rack. Bluefield 4 AI-native storage (KV cache, KUDF, KVS).
Groq LPX Rack. 8 LP30 chips, massive SRAM for ultra-fast decode.
Spectrum X CPO. Co-packaged optics scale-out (in production with TSMC).

Generational Performance Leap

Groq LPU Integration

How It Works

NVIDIA swallowed Groq's team and IP. The real move: disaggregated inference via Dynamo.

Vera Rubin handles prefill. Heavy math, context ingestion, KV cache (288 GB HBM).
Groq handles decode. Deterministic dataflow, compiler-scheduled, massive SRAM (500 MB/chip) for ultra-low-latency token generation.
Connected via Ethernet with a special low-latency mode (2x reduction).
Unified by Dynamo: the operating system for AI factories.

Jensen's Recommendation

Jensen's recommended mix for most data centers:

75% Vera Rubin. Bulk workloads, high throughput.
25% Groq. Premium tier, coding, reasoning, anything where latency = money.
If your workload is mostly batch/throughput: 100% Vera Rubin.
Groq extends performance beyond NVLink 72's bandwidth limits for 1000+ tokens/sec.

Inference Architecture Deep Dive

Source: SemiAnalysis, “The Inference Kingdom Expands” (Mar 2026)

The $20B Groq “Acquisition”

Structured as IP license + team hire, not a legal acquisition. Walks right past antitrust.
4 months from handshake → integrated system on stage. Regulators never got a shot.
Groq LPU 2 (Samsung SF4X) never shipped, SerDes couldn't hit 112G. Skipped to LPU 3 (LP30).
LP30 has zero Nvidia IP, it's pure Groq design. Real co-design starts with LP40 (TSMC N3P, CoWoS-R, NVLink protocol).

Supply Chain Arbitrage 🔑

LP30 runs on Samsung SF4, NOT TSMC, NOT HBM.
Zero competition for scarce TSMC N3 allocation or HBM supply.
Nvidia ramps LPU production without cannibalizing a single GPU wafer.
Pure incremental revenue from stranded Samsung capacity nobody else can touch.
LP30: monolithic die, no advanced packaging. Simpler fab, faster ramp.

💡 The single most underpriced detail in the entire GTC cycle.

Attention-FFN Disaggregation (AFD)

Attention = stateful (KV cache) → runs on GPUs (needs HBM capacity).
FFN = stateless → runs on LPUs (deterministic, SRAM-fast).
Tokens ping-pong between GPUs and LPUs over Spectrum-X Ethernet.
MoE models get sparser → fewer tokens per expert → GPU utilization tanks → AFD fixes this by freeing all HBM for KV cache.
Alternate mode: LPUs run speculative decoding draft models (1.5–2× throughput per step).

LP30 vs LP40 Roadmap

LP30: Samsung SF4, 500MB SRAM, 1.2 PFLOPS FP8, Groq C2C protocol. Shipping with Vera Rubin.
LP35: Minor refresh on SF4, adds NVFP4 format. New tapeout, same node.
LP40: TSMC N3P + CoWoS-R. First true Nvidia co-design. NVLink replaces Groq C2C. Hybrid bonded DRAM from SK Hynix extends memory beyond SRAM.

⚠️ Bear case: LP40 is 2+ years out. Execution risk on Samsung node is real.

CPO Roadmap (Co-Packaged Optics)

Rubin NVL72: All copper scale-up (Oberon rack).
Rubin Ultra NVL144: All copper (Kyber rack). No CPO despite analyst rumors.
Rubin Ultra NVL576: 8× Oberon racks, first CPO deployment between racks. Copper within. Low volume / testing.
Feynman NVL1152: 8× Kyber racks. CPO between racks, copper within. Jensen says “all CPO” but blog disagrees, still TBD.
Nvidia's philosophy: “Copper where we can, optics where we must.”

🔍

“Jensen turned Samsung's stranded fabs into an LPU factory while the rest of the industry knife-fights over TSMC wafers. Incremental revenue on capacity nobody else can even bid on. This is the Nvidia moat in one slide.”

Investment Thesis, Samsung Fab Arbitrage

Token Economics

The Token Factory Pricing Spectrum

Jensen's framing: “Every CEO in the world will be studying their token factory throughput chart. This year's decisions show up precisely as next year's revenues.”

Tier	Price/M Tokens	Characteristics	Use Case
FREE	$0	High throughput, small models, high latency	Customer acquisition, basic queries
BASIC	$3	Medium models, reasonable speed	Consumer chatbots, content generation
PRO	$6 – $45	Larger models, higher speed, long context	Professional work, analysis, code assist
PREMIUM	$150	Frontier models, max intelligence, fast decode	Research, critical path decisions, deep reasoning
ULTRA	$150+	Ultra-fast tokens, Groq-accelerated decode	Real-time coding, long research runs

OpenClaw. The Linux of Agents

Why It Matters

Jensen gave this more stage time than Vera Rubin. Direct quotes:

“As big as HTML.”
“As big as Linux.”
“As big as Kubernetes.”
Most popular open-source project in history, beat Linux's star count in weeks.
“OpenClaw has open-sourced the operating system of agent computers.”

Enterprise Stack (Nemo Claw)

Open Shell. Security and privacy guardrails for corporate agents.
Privacy Router. Blocks sensitive data from leaking to model providers.
Policy Guard Rails. Hooks into existing SaaS policy engines.
Every SaaS company becomes a GaaS company (agentic-as-a-service).
Reference design: download, customize, ship.

Neimotron Coalition

Partners building Neimotron 4, NVIDIA's frontier open model:

Cursor

Mistral

Perplexity

LangChain

Black Forest Labs

Reflection

Sarv (India)

Mirror

Architecture Roadmap

Blackwell

NVLink 72 · FP4 · Dynamo

Shipping

Vera Rubin

3.6 EF · LP30 Groq · CPO

Sampling Now

Rubin Ultra

NVLink 144 · LP35 · NVFP4

Taping Out

Feynman

LP40 · Rosa CPU · BF5 · CX10

Next Gen

New architecture every year. Copper and optical in parallel going forward.

Scale-Up: Copper vs Optical

Vera CPU Standalone

Physical AI & Robotics

Robo-Taxi Partnerships

New partners (18M cars/year combined production):

BYD

Hyundai

Nissan

Joins existing partners: Mercedes, Toyota, GM. Plus Uber for multi-city deployment.

“The ChatGPT moment of self-driving cars has arrived.”

The Three Computers of Robotics

Training computer. Isaac Lab for RL policy training at scale.
Simulation computer. Newton physics + Cosmos world models for synthetic data.
Robot computer. Jetson. Runs on the robot itself.

110 robots on the GTC show floor. Every major robotics company works with NVIDIA.

🔍

“A walking, talking Olaf robot strolled on stage and roasted Jensen ('I thought you'd be taller'). Trained in Newton physics + Isaac Lab + Omniverse, running live on Jetson. This is what Disneyland looks like in 3 years: AI characters that improvise, not loop.”

Disney's Olaf Moment at GTC 2026

More Highlights

DLSS 5 / Neuro Rendering

Fuses 3D graphics + generative AI. Deterministic structure meets probabilistic generation.
“One is completely predictive, the other probabilistic yet highly realistic.”
This pattern, structured data + generative AI, is the template for every industry.
“Structured data is the foundation of trustworthy AI.”
The demo was the best visual at GTC. Computer graphics that breathe.

Notable Quotes

“If you have the wrong architecture, even if it's free, it's not cheap enough.”

On why token cost per watt matters more than chip cost

“Dylan Patel accused me of sandbagging. He says it's actually 50×. And he's not wrong.”

On Blackwell inference benchmarks (Semi Analysis study)

“Every engineer's recruiting package will include a token budget. Tokens are the new compensation.”

On the future of knowledge work

“Every single SaaS company will become a GaaS company. An agentic-as-a-service company.”

On the enterprise IT shift

“Computing demand has increased by 1 million times in the last two years.”

On the inference inflection (10,000x per-task times 100x usage)

“We see through 2027 at least $1 trillion.”

On the demand pipeline (up from $500B last year)

Contents