Agentic AI Drives New Models and Custom Chips

Agentic AI is pushing both model design and infrastructure toward longer, tool-driven workflows. Xiaomi has put MiMo‑v2.5‑Pro into public beta, touting stronger long-horizon coherence, “harness-aware” structured development, and self-correction across multi-thousand-step tool chains—demonstrated by end-to-end software builds and even analog circuit optimization—without raising API pricing. In parallel, Google is reshaping its TPU strategy for the agentic era by splitting TPU v8 into TPU 8t for throughput-heavy training and TPU 8i for low-latency inference and agent loops, pairing large-scale SuperPod capacity with latency-cutting on-chip memory and networking upgrades.

Recent News (4)

MiMo-v2.5-Pro

Xiaomi has launched MiMo-V2.5-Pro in public beta, a major upgrade focused on agentic capabilities, long-horizon coherence, and complex software and engineering tasks. Deployed across Xiaomi’s API Platform and AI Studio, the model reportedly sustains multi-thousand-step tool workflows without price changes. Internal benchmarks highlight three flagship feats: building a complete SysY-to-RISC-V compiler in Rust (233/233 tests) over 4.3 hours and 672 tool calls; producing an 8,192-line multi-track desktop video editor across 1,868 tool calls in 11.5 hours; and designing and optimizing a FVF-LDO analog regulator in TSMC 180nm via ngspice closed-loop iteration, meeting multiple metrics. Xiaomi emphasizes the model’s “harness awareness,” structured development, and self-correcting behavior, suggesting broader implications for autonomous agent workflows and developer productivity.

18pts

ZelisweetdreameritApril 23, 2026

Google Just Split Its TPU Into Two Chips. Here's What That Actually Signals About the Agentic Era.

Google unveiled TPU v8 as two purpose-built chips at Cloud Next ’26: the TPU 8t for large-scale training and the TPU 8i for low-latency inference and agentic workloads. The split addresses operational differences between throughput-optimized training and latency-sensitive agent loops (decomposition, dispatch, evaluation), which compound per-step latency. TPU 8t delivers massive training scale (9,600 chips per superpod, 121 exaflops, 2 PB shared memory, multi-site clusters with over a million TPUs). TPU 8i focuses on inference: 3x more on-chip SRAM, a Collectives Acceleration Engine to cut collective latency 5x, and a Boardfly high-radix network that halves communication latency; it offers large pod growth and ~80% better inference performance per dollar. This signals Google optimizing infrastructure for agentic AI at cloud scale.