YK Research
◼ NVIDIA GTC 2026

Jensen Huang Keynote

The Inference Inflection, Vera Rubin, and the Agent Revolution

March 17, 2026 · SAP Center, San Jose · ~2h keynote

Demand Pipeline
$1T+
Token Throughput
350×
Rev/GW vs Blackwell
Groq Premium Tier
35×

Through 2027 · In 2 years (per GW) · Vera Rubin upgrade · Decode throughput

TL;DR: Jensen's biggest GTC yet. He unveiled the Vera Rubin platform (successor to Blackwell. 7 chips, 5 rack-scale computers, 3.6 exaflops), announced Groq LPU integration for 35× token decode throughput, declared $1 trillion+ in visible demand through 2027, and essentially crowned OpenClaw as the “Linux of agents”. every company needs an OpenClaw strategy. Oh, and a Disney Olaf robot walked on stage and roasted Jensen's height.

Vera Rubin Platform

Architecture

  • 7 chips, 5 rack-scale computers. one unified AI supercomputer
  • NVLink 72 @ 260 TB/s all-to-all bandwidth
  • 3.6 exaflops of compute
  • 100% liquid cooled with 45°C hot water. takes pressure off data center cooling
  • Install time: 2 days → 2 hours (structured cables, no spaghetti)

The Five Racks

  • NVLink 72 Rack. GPU compute with 6th-gen NVLink scale-up
  • Vera CPU Rack. LPDDR5, extreme single-thread perf for agentic tool use
  • STX Rack. Bluefield 4 AI-native storage (KV cache, KUDF, KVS)
  • Groq LPX Rack. 8 LP30 chips, massive SRAM for ultra-fast decode
  • Spectrum X CPO. co-packaged optics scale-out (in production with TSMC)

Generational Performance Leap

Groq LPU Integration

How It Works

NVIDIA acquired the Groq team and licensed the technology. The key insight: disaggregated inference via Dynamo.

  • Vera Rubin handles prefill. massive math, context processing, KV cache (288 GB HBM)
  • Groq handles decode. deterministic dataflow, compiler-scheduled, massive SRAM (500 MB/chip) for ultra-low-latency token generation
  • Connected via Ethernet with a special low-latency mode (2× reduction)
  • Unified by Dynamo. the operating system for AI factories

Jensen's Recommendation

For most data centers:

  • 75% Vera Rubin. handles the vast majority of workloads (high throughput)
  • 25% Groq. for premium tier, coding, high-value token generation
  • If your workload is mostly batch/throughput → 100% Vera Rubin
  • Groq extends performance beyond NVLink 72's bandwidth limits for 1000+ tokens/sec

Token Economics

The Token Factory Pricing Spectrum

Jensen's key thesis: “Every CEO in the world will be studying their token factory throughput chart. This year's decisions show up precisely as next year's revenues.”

TierPrice/M TokensCharacteristicsUse Case
FREE$0High throughput, small models, high latencyCustomer acquisition, basic queries
BASIC$3Medium models, reasonable speedConsumer chatbots, content generation
PRO$6 – $45Larger models, higher speed, long contextProfessional work, analysis, code assist
PREMIUM$150Frontier models, max intelligence, fast decodeResearch, critical path decisions, deep reasoning
ULTRA$150+Ultra-fast tokens, Groq-accelerated decodeReal-time coding, long research runs

OpenClaw. The Linux of Agents

Why It Matters

Jensen devoted a significant portion of the keynote to OpenClaw, calling it:

  • As big as HTML”. started the internet
  • As big as Linux”. powered cloud computing
  • As big as Kubernetes”. enabled mobile cloud
  • Most popular open-source project in history. exceeded Linux in weeks
  • “OpenClaw has open-sourced the operating system of agent computers

Enterprise Stack (Nemo Claw)

  • Open Shell. security/privacy guardrails for corporate agents
  • Privacy Router. prevents sensitive data exfiltration
  • Policy Guard Rails. connects to existing SaaS policy engines
  • Every SaaS company → GaaS company (agentic-as-a-service)
  • Reference design downloadable and optimizable

Neimotron Coalition

Partnering to build Neimotron 4. NVIDIA's frontier open model:

Cursor
Mistral
Perplexity
LangChain
Black Forest Labs
Reflection
Sarv (India)
Mirror

Architecture Roadmap

Blackwell
NVLink 72 · FP4 · Dynamo
Shipping
Vera Rubin
3.6 EF · LP30 Groq · CPO
Sampling Now
Rubin Ultra
NVLink 144 · LP35 · NVFP4
Taping Out
Feynman
LP40 · Rosa CPU · BF5 · CX10
Next Gen

Brand new architecture every single year. Both copper and optical scale-up going forward.

Scale-Up: Copper vs Optical

+

Vera CPU Standalone

+

Physical AI & Robotics

Robo-Taxi Partnerships

New partners announced (18M cars/year combined):

BYD
Hyundai
Nissan
Ji

Joining existing partners: Mercedes, Toyota, GM. Plus Uber for multi-city deployment.

“The ChatGPT moment of self-driving cars has arrived.”

The Three Computers of Robotics

  • Training computer. Isaac Lab for RL policy training at scale
  • Simulation computer. Newton physics + Cosmos world models for synthetic data
  • Robot computer. Jetson, runs on the robot itself

110 robots on the GTC show floor. Every major robotics company is working with NVIDIA.

🔍
A walking, talking Olaf robot walked on stage. trained using Newton physics simulator + Isaac Lab + Omniverse, running on Jetson. Had a full comedy exchange with Jensen ('I thought you'd be taller'). The future of Disneyland is AI-powered characters roaming the park.
Disney's Olaf Moment at GTC 2026

More Highlights

DLSS 5 / Neuro Rendering

  • Fusion of 3D graphics + generative AI. controllable structured data meets probabilistic generation
  • “One is completely predictive, the other probabilistic yet highly realistic”
  • The pattern of structured data + generative AI will repeat in every industry
  • “Structured data is the foundation of trustworthy AI”
  • Computer graphics literally comes to life. Jensen showed a jaw-dropping demo

Notable Quotes

If you have the wrong architecture, even if it's free, it's not cheap enough.

— On why token cost per watt matters more than chip cost

Dylan Patel accused me of sandbagging. He says it's actually 50×. And he's not wrong.

— On Blackwell inference benchmarks (Semi Analysis study)

Every engineer's recruiting package will include a token budget. Tokens are the new compensation.

— On the future of knowledge work

Every single SaaS company will become a GaaS company. an agentic-as-a-service company.

— On the enterprise IT transformation

Computing demand has increased by 1 million times in the last two years.

— On the inference inflection (10,000× per-task × 100× usage)

We see through 2027 at least $1 trillion.

— On the demand pipeline (up from $500B last year)

Watch the Full Keynote