Loading...
Loading...
OpenAI redesigned its WebRTC infrastructure to slash voice-AI latency and scale to hundreds of millions of weekly users. By splitting media handling into stateless relays that accept client connections and stateful transceivers that own ICE/DTLS sessions, the company preserved standard WebRTC client behavior while changing internal packet routing. That separation improves first-hop routing, reduces jitter and packet loss, and lowers media setup and round-trip times—enabling smoother conversational turn-taking. The move ties termination to the right network layer so inference backends can stream transcription and responses while users speak, boosting responsiveness for ChatGPT Voice, the Realtime API, and interactive agents.
Zac Hall / 9to5Mac : OpenAI launches three new real-time voice models for reasoning, translation, and transcription, included in its Realtime API — OpenAI has just released three new realtime voice models that it says will “unlock a new class of voice apps for developers.” Each new voice intelligence model …
OpenAI says it rearchitected its WebRTC stack to deliver low-latency voice AI at global scale for ChatGPT voice and developers using its Realtime API, according to a May 4, 2026 engineering post by Yi Zhang and William McDonald. The company frames the challenge around three requirements: supporting more than 900 million weekly active users, enabling fast connection setup so users can speak immediately, and keeping media round-trip time low and stable with minimal jitter and packet loss for natural turn-taking and barge-in. OpenAI cites scaling constraints with one-port-per-session media termination, the need for stable ownership of stateful ICE and DTLS sessions, and global routing that preserves low first-hop latency. It describes a “split relay plus transceiver” architecture that keeps standard WebRTC client behavior while changing internal packet routing.
OpenAI rearchitected its WebRTC stack to deliver low-latency voice AI at global scale, targeting 900M+ weekly users and smooth conversational turn-taking. Engineers replaced a one-port-per-session media termination model with a split relay + transceiver architecture to keep standard WebRTC client behavior while changing internal packet routing. The new design separates stateless relays that accept client connections from stateful transceivers that own ICE/DTLS sessions, enabling better first-hop routing, stable session ownership, and reduced jitter and packet loss. This matters for ChatGPT voice, the Realtime API, and interactive agents because it lowers setup time and media round-trip latency, improving responsiveness across regions.
OpenAI rearchitected its WebRTC stack to deliver low-latency voice AI at global scale, addressing issues that emerged with >900M weekly users. Engineers moved to a split relay plus transceiver design to preserve standard WebRTC client behavior while changing internal packet routing, solving constraints around one-port-per-session termination, stateful ICE/DTLS ownership, and first-hop latency. The post explains why WebRTC is essential—handling NAT traversal, encryption, codec negotiation, and jitter buffering—and how terminating sessions at the right network layer connects media to inference backends so models can stream transcription and responses while users are still speaking. This improves conversational latency for ChatGPT Voice and Realtime API users and eases developer interoperability.