Loading...
Loading...
Google introduced its eighth-generation TPU family with two purpose-built chips: TPU 8t for large-scale training and TPU 8i for low-latency, memory-heavy agent inference. The designs pair new numeric formats (native FP4/FP8), specialized accelerators for sparse embeddings, and upgraded networking (Virgo, Boardfly) to dramatically raise FLOPS, bandwidth, and pod scale while trimming CPU and memory bottlenecks. Co-developed with DeepMind and integrated into Google Cloud’s AI Hypercomputer, the TPUs emphasize energy-efficient performance, liquid cooling, and software-hardware co-optimization to improve cost-per-token and real-world responsiveness for foundation models and agentic workflows. Availability is planned later this year with early enterprise testing underway.
Google's TPU 8t and 8i reshape infrastructure economics and performance for large models and agentic inference, affecting choices for cloud compute, model design, and deployment. Tech professionals must understand the new numeric formats, networking, and pod-scale trade-offs to optimize cost, latency, and scalability.
Dossier last updated: 2026-05-12 07:17:43
Google unveiled eighth-generation TPUs—TPU 8t for large-scale training and TPU 8i for low-latency agent inference—arguing its vertically integrated stack avoids the high margins of Nvidia-based rentals. TPU 8t boosts FP4 EFlops per pod 2.8x over the prior gen, doubles scale-up bandwidth, quadruples scale-out networking, and can scale beyond one million chips in a single job via new Virgo networking and TPU Direct Storage to cut CPU-mediated data hops. TPU 8i focuses on agent workloads with a redesigned Boardfly network topology to reduce latency, delivering ~9.8x FP8 EFlops per pod and much higher HBM capacity and pod size; it targets real-time sampling and memory-heavy inference. Google says the two-chip roadmap reflects a strategic split for training vs. agent inference and promises better cost-per-token economics for cloud customers.
Google unveiled technical details of its eighth-generation TPU family — TPU 8t and TPU 8i — designed for modern AI workloads like agentic systems, world models, and massive MoEs. TPU 8t targets large-scale pretraining and embedding-heavy tasks with a 9,600-chip 3D torus superpod, a new SparseCore accelerator for irregular embedding access, improved VPU/MXU overlap for higher utilization, and native FP4 to double MXU throughput while cutting memory bandwidth demands. The platform integrates Arm-based Axion CPU headers to reduce host-side preprocessing bottlenecks and is part of Google Cloud’s AI Hypercomputer to support training, fine-tuning, and serving. These changes matter because they address real throughput, latency, and scaling limits for next-gen model training and multi-agent reasoning at cloud scale.
Google announced its eighth-generation Tensor Processing Units, introducing two purpose-built chips: TPU 8t for large-model training and TPU 8i for low-latency inference and agentic workloads. Developed with Google DeepMind, the designs emphasize co-optimization of silicon, networking, cooling, and software to meet the multi-step reasoning and continuous-learning demands of AI agents. Google positions these TPUs to improve power efficiency and absolute performance for foundation models like Gemini and large-scale inference, with availability slated later this year and early customers already testing deployments. This matters because specialized hardware continues to shape model scale, operational cost, and latency for cloud AI services and enterprise AI initiatives.
Google announced its eighth-generation Tensor Processing Units — TPU 8t for large-scale model training and TPU 8i for low-latency inference — aimed at powering agentic AI workloads. Co-designed with DeepMind, the chips emphasize custom numerics, liquid cooling, bespoke interconnects and software co-optimization to boost performance and energy efficiency for training foundation models like Gemini and serving multi-step, continuous-agent workflows. Google positions the pair as components of its custom supercomputers to accelerate development, scaling and deployment of demanding ML workloads; availability is slated later this year with enterprise partners already evaluating the systems. The release matters for cloud AI competitiveness, cost and power profiles of large-scale AI deployments.