Google’s TPU 8t and 8i Power the Agentic AI Era

Google introduced its eighth-generation TPU family with two purpose-built chips: TPU 8t for large-scale training and TPU 8i for low-latency, memory-heavy agent inference. The designs pair new numeric formats (native FP4/FP8), specialized accelerators for sparse embeddings, and upgraded networking (Virgo, Boardfly) to dramatically raise FLOPS, bandwidth, and pod scale while trimming CPU and memory bottlenecks. Co-developed with DeepMind and integrated into Google Cloud’s AI Hypercomputer, the TPUs emphasize energy-efficient performance, liquid cooling, and software-hardware co-optimization to improve cost-per-token and real-world responsiveness for foundation models and agentic workflows. Availability is planned later this year with early enterprise testing underway.

Why It Matters

Google's TPU 8t and 8i reshape infrastructure economics and performance for large models and agentic inference, affecting choices for cloud compute, model design, and deployment. Tech professionals must understand the new numeric formats, networking, and pod-scale trade-offs to optimize cost, latency, and scalability.

Latest Changes

Google introduced two eighth-generation TPUs: TPU 8t for training and TPU 8i for low-latency agent inference

New numeric formats (native FP4/FP8) and specialized sparse-embedding accelerators are integrated on-chip

Upgraded networking (Virgo, Boardfly) and pod scaling boost FLOPS and bandwidth while lowering CPU/memory bottlenecks

Timeline

2026-04-22 — Google announced its eighth-generation TPUs, unveiling TPU 8t and TPU 8i for training and inference

2026-04-22 — Technical deep-dive published detailing architecture, numeric formats, sparse accelerators, and networking

2026-04-22 — Google emphasized co-design with DeepMind and integration into Google Cloud's AI Hypercomputer

2026-04-22 — Company claimed the TPU designs avoid Nvidia margins and improve cost-per-token and agent responsiveness

Recent News (4)

Google doesn't pay the Nvidia tax. Its new TPUs explain why.

Google unveiled eighth-generation TPUs—TPU 8t for large-scale training and TPU 8i for low-latency agent inference—arguing its vertically integrated stack avoids the high margins of Nvidia-based rentals. TPU 8t boosts FP4 EFlops per pod 2.8x over the prior gen, doubles scale-up bandwidth, quadruples scale-out networking, and can scale beyond one million chips in a single job via new Virgo networking and TPU Direct Storage to cut CPU-mediated data hops. TPU 8i focuses on agent workloads with a redesigned Boardfly network topology to reduce latency, delivering ~9.8x FP8 EFlops per pod and much higher HBM capacity and pod size; it targets real-time sampling and memory-heavy inference. Google says the two-chip roadmap reflects a strategic split for training vs. agent inference and promises better cost-per-token economics for cloud customers.

src_venturebeatsam.witteveen@venturebeat.com (Sam Witteveen)April 22, 2026

The eighth-generation TPU: An architecture deep dive

Google unveiled technical details of its eighth-generation TPU family — TPU 8t and TPU 8i — designed for modern AI workloads like agentic systems, world models, and massive MoEs. TPU 8t targets large-scale pretraining and embedding-heavy tasks with a 9,600-chip 3D torus superpod, a new SparseCore accelerator for irregular embedding access, improved VPU/MXU overlap for higher utilization, and native FP4 to double MXU throughput while cutting memory bandwidth demands. The platform integrates Arm-based Axion CPU headers to reduce host-side preprocessing bottlenecks and is part of Google Cloud’s AI Hypercomputer to support training, fine-tuning, and serving. These changes matter because they address real throughput, latency, and scaling limits for next-gen model training and multi-agent reasoning at cloud scale.

20pts

Why It Matters

Latest Changes

Timeline

What to Watch

Recent News (4)