Overview
OpenAI’s real-time speech-to-speech API enabling low-latency voice conversations with GPT models for voice agents.
Details
The OpenAI Realtime API enables developers to build low-latency voice agents using a single speech-to-speech model that natively handles audio in and audio out, eliminating the need to chain separate ASR, LLM, and TTS models. It supports function calling, interruptions, and streaming audio over WebSocket or WebRTC, with multiple voice options. The Realtime API has become a foundational building block for many voice agent products and a preferred backend for platforms like Vapi and LiveKit.
Tags
voice, infrastructure, api, developers