YK Research

AI + Semi Weekly

Compute Demand Is Accelerating Again

Cheap models lower software pricing power, but they increase total token demand.

28 April 2026 · YK Research

The One-Line Takeaway

🔍
The market is repricing the AI stack around a simple idea: models get cheaper, but compute usage grows faster. That is bearish for weak AI software margins and bullish for chips, memory, networking, power and cloud capacity.
YK Research
SOXX 7d
+9.6%
SMH 7d
+9.1%
Best major semi
ARM +29.5%
NVDA 7d
+7.4%

This week clustered around one theme: the AI race is moving from model capability to deployed capacity. Google is willing to commit tens of billions to Anthropic plus gigawatts of TPU capacity. NVIDIA is using Blackwell and Blackwell Ultra to cut token cost. DeepSeek is forcing price competition at the model layer. Tesla is trying to pull more silicon in-house, but its timing still lags NVIDIA shipping systems today.

This Week in Stock Prices

Price action confirms the market is buying the hardware bottleneck. Semis led broad tech: SOXX +9.6%, SMH +9.1%, with the biggest moves in CPU, memory and Arm ecosystem names.

Source: Yahoo Finance via yfinance, close-to-close, 17 Apr 2026 to 27 Apr 2026.

Latest Closes Used

ARM
$215.88
+29.48%
INTC
$84.99
+24.07%
AMD
$334.63
+20.20%
MU
$524.56
+15.27%
SOXX
$455.41
+9.55%
TSM
$404.98
+9.31%
SMH
$506.26
+9.07%
NVDA
$216.61
+7.40%
AVGO
$418.20
+2.87%
GOOG
$348.52
+2.69%
TSLA
$378.67
-5.48%
ASML
$1432.44
-1.87%

1. The Compute Race Is Now Balance-Sheet Warfare

Google may invest up to $40B in Anthropic: $10B upfront at a reported $350B valuation, with another $30B tied to performance targets. The important part is not only the cash. It is the infrastructure lock-in. Google Cloud is expected to provide roughly 5 GW of capacity over five years, on top of Anthropic's earlier Google plus Broadcom TPU access.

  • GOOG: Anthropic becomes both a strategic customer and proof that TPUs can support frontier workloads at gigawatt scale.
  • AVGO: Custom silicon demand rises if hyperscalers keep building TPU-like alternatives to NVIDIA.
  • NVDA: The competitive threat is real, but the total AI factory pie is expanding faster than alternatives can absorb it.
Source: TechCrunch, Bloomberg-reported Google and Anthropic investment terms, 24 Apr 2026.

2. TPU 8: Google Splits Training and Inference

The most important Google hardware announcement is TPU 8. Unlike TPU v7 Ironwood, which is a more general training plus inference accelerator, TPU 8 splits into two chips: TPU 8t for frontier training and TPU 8i for low-latency inference. That is the right architectural fork for LLMs because training wants maximum scale-up bandwidth and goodput, while inference wants memory bandwidth, KV-cache locality and low synchronization latency.

SemiAnalysis has been tracking the same setup from several angles: TPU v8 upward revisions, Broadcom TPU v7/v8 share, Google selling TPU systems externally, Anthropic plus Google TPU ramps, TPU manufacturing at Celestica/Foxconn, and Rubin output/HBM4 risk. Most of that coverage sits behind the Institutional feed, so the clean read from the index is directional rather than spec-confirming: the TPU story is not a one-off Google announcement. It is now a recurring supply-chain and capex theme.

TPU 8t superpod
9,600 chips
TPU 8t pod compute
121 EFLOPS
TPU 8i HBM
288 GB
TPU 8i SRAM
384 MB

TPU v7 vs TPU 8 vs NVIDIA Rubin

Sources: Google TPU 8 announcement; Google Ironwood technical blog; The Register reporting on TPU 8 and Rubin specs; SemiAnalysis article index for TPU v8, Broadcom TPU, Anthropic TPU ramp and Rubin/HBM4 coverage.

What changed from v7

  • v7 Ironwood: one flagship TPU for the full AI lifecycle, with 192 GiB HBM3E per chip, 7.4 TB/s HBM bandwidth and 42.5 EFLOPS FP8 at superpod scale.
  • 8t: training-optimized. Google says nearly 3x compute per pod, 2 PB shared HBM, 2x interchip bandwidth and target goodput above 97%.
  • 8i: inference-optimized. 288 GB HBM, 384 MB on-chip SRAM and collective acceleration to reduce all-reduce/all-gather stalls in MoE serving.

What it means for LLMs

  • Training: 8t shortens frontier model iteration cycles. Higher goodput matters because a 1% stall rate at frontier scale is measured in days.
  • Inference: 8i is built for agent swarms, long context and MoE routing. More SRAM means more KV-cache locality, fewer trips to HBM and lower tail latency.
  • Economics: Google can sell Anthropic, Gemini and potentially external merchant TPU customers a vertically integrated TPU cloud instead of renting NVIDIA scarcity at market price.

How it compares with NVIDIA

  • NVIDIA wins per chip: Rubin is reported at 35 PFLOPS FP4, 288 GB HBM4 and 22 TB/s bandwidth, ahead of TPU 8 on raw chip specs.
  • Google wins integrated scale: TPU 8t targets 9,600 chips in one superpod and up to million-chip logical clusters through Virgo, JAX and Pathways.
  • SemiAnalysis nuance: the watch item is not only TPU specs. It is supply availability. Rubin output reductions, HBM4 constraints or schedule slips would make Google's vertically controlled TPU capacity more valuable.
  • Investment read: NVIDIA remains the merchant standard. TPU 8 caps hyperscaler dependency and is a real margin lever for Google, but it does not remove NVIDIA demand outside Google's walled garden.

3. DeepSeek V4 Is Deflationary for Models, Inflationary for Tokens

DeepSeek V4 Pro and Flash matter because they make near-frontier capability available at very low price points. V4 Pro is a 1.6T parameter MoE model with 49B active parameters. V4 Flash is 284B parameters with 13B active. Both support 1M-token context. Reported pricing undercuts frontier model pricing materially.

🔍
When intelligence gets cheaper, people do not spend less on intelligence. They run more of it.
Investment read
  • Bearish: AI software wrappers with no workflow moat. Pricing power compresses.
  • Bullish: Inference volume, memory bandwidth, networking and low-latency serving infrastructure.
  • Portfolio read: Own the toll roads. Be careful owning the apps that cannot defend gross margin.
Source: TechCrunch DeepSeek V4 preview, 24 Apr 2026.

4. Blackwell Ultra Turns Agents Into a GPU Demand Story

NVIDIA's strongest argument this week was economic, not benchmark theater. OpenAI Codex running GPT-5.5 on GB200 NVL72 gives investors a concrete enterprise use case for agentic AI. NVIDIA then pushed the same point with Blackwell Ultra: up to 50x higher throughput per megawatt and 35x lower token cost versus Hopper for agentic AI workloads.

Source: YK Research scoring from NVIDIA, TechCrunch and Electrek events.

The key loop: lower token cost makes long-running agents economically viable, which increases token volume, which requires more high-end GPUs. That is why open model inference providers still move onto Blackwell. Open models reduce model rents. They do not eliminate the need for high-end chips.

Sources: NVIDIA blogs on Codex, Blackwell Ultra and open-model inference providers, Apr 2026.

5. Tesla AI5 Is Real, but It Is Not the 2026 Trade

Tesla taped out AI5. That is a real milestone because tape-out means the chip design is locked and sent for fabrication. But it is not volume production. Automotive-grade silicon still needs manufacture, validation and ramp. The practical timing still points toward mid-2027 for meaningful AI5 vehicle volume.

AI5 milestone
Tape-out
Likely volume
2027
TSLA 7d
-5.5%
Read
Future optionality

Terafab is more ambitious: Tesla, SpaceX and xAI want a massive Austin semiconductor project for robotaxis, Optimus, xAI and space AI. Intel joining would add credibility around manufacturing and packaging. But execution risk is enormous. This is a strategic option, not near-term earnings power.

Source: Electrek AI5 tape-out report, 15 Apr 2026; user-provided Terafab and Intel joining reports.

Portfolio Read

Highest conviction

  • NVDA: Still the shipped infrastructure standard. Blackwell economics directly monetize agentic AI, but Rubin/HBM4 execution is now the key risk to track.
  • AVGO: Custom silicon beneficiary if TPUs and hyperscaler ASICs keep scaling. SemiAnalysis' TPU coverage keeps pointing at Broadcom as the clean supply-chain read-through.
  • MU / HBM: Token volume and long-context agents pull memory bandwidth.

Good but price-sensitive

  • AMD: CPU and accelerator angle is improving, but the weekly move was already +20%.
  • ARM: Structural winner from CPU-rich agentic infrastructure, but +29.5% in a week demands discipline.
  • TSM: The neutral toll booth. Benefits from both NVIDIA and custom silicon.

Watchlist / caution

  • GOOG: Strategic AI infra improves. The SemiAnalysis angle makes TPU capacity look more like a sellable cloud asset than a Gemini-only cost center, but model competition still pressures margins.
  • TSLA: AI5 and Terafab are upside options, not current shipping revenue.
  • Weak SaaS wrappers: DeepSeek-style pricing is a direct gross margin threat.

What to Watch Next Week

  • Token pricing: If frontier providers cut prices after DeepSeek V4, software margin compression accelerates.
  • SemiAnalysis TPU thread: Watch for more detail on TPU v8 revisions, merchant TPU customers, Broadcom share, and whether Anthropic's compute plan tilts further toward Google versus AWS Trainium.
  • Cloud capex commentary: Listen for gigawatts, Blackwell allocation, TPU capacity and power constraints.
  • Memory pricing: HBM and high-end DRAM are the cleanest way to confirm real infrastructure scarcity.
  • Semis after the pump: A 9 to 10% weekly move in SOXX/SMH is not an entry signal by itself. Prefer pullbacks in names where the edge is structural.
🔍
Bottom line: cheap AI is not bad for semis. Cheap AI is how AI becomes a daily workload, and daily workloads need factories.
YK Research