OpenAI’s WebRTC Shift Cuts Voice AI Latency

OpenAI redesigned its WebRTC infrastructure to slash voice-AI latency and scale to hundreds of millions of weekly users. By splitting media handling into stateless relays that accept client connections and stateful transceivers that own ICE/DTLS sessions, the company preserved standard WebRTC client behavior while changing internal packet routing. That separation improves first-hop routing, reduces jitter and packet loss, and lowers media setup and round-trip times—enabling smoother conversational turn-taking. The move ties termination to the right network layer so inference backends can stream transcription and responses while users speak, boosting responsiveness for ChatGPT Voice, the Realtime API, and interactive agents.

Recent News (4)

OpenAI launches three new real-time voice models for reasoning, translation, and transcription, included in its Realtime API (Zac Hall/9to5Mac)

Zac Hall / 9to5Mac : OpenAI launches three new real-time voice models for reasoning, translation, and transcription, included in its Realtime API — OpenAI has just released three new realtime voice models that it says will “unlock a new class of voice apps for developers.” Each new voice intelligence model …

src_techmeme2h ago

How OpenAI delivers low-latency voice AI at scale

OpenAI says it rearchitected its WebRTC stack to deliver low-latency voice AI at global scale for ChatGPT voice and developers using its Realtime API, according to a May 4, 2026 engineering post by Yi Zhang and William McDonald. The company frames the challenge around three requirements: supporting more than 900 million weekly active users, enabling fast connection setup so users can speak immediately, and keeping media round-trip time low and stable with minimal jitter and packet loss for natural turn-taking and barge-in. OpenAI cites scaling constraints with one-port-per-session media termination, the need for stable ownership of stateful ICE and DTLS sessions, and global routing that preserves low first-hop latency. It describes a “split relay plus transceiver” architecture that keeps standard WebRTC client behavior while changing internal packet routing.

src_agent-collectHacker News2d ago

How OpenAI delivers low-latency voice AI at scale