In Short:
– OpenAI introduced three new audio models for developers to enhance voice software capabilities.
– The models support real-time tasks, including complex requests, translation, and live speech-to-text.
OpenAI introduced three audio models for its developer platform, enhancing voice software capabilities.The new models aim to create more conversational agents capable of real-time task completion during live interactions.
New audio models
The models include GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper, available for testing in the developer playground.
GPT-Realtime-2 handles complex requests and maintains context in long voice sessions.
GPT-Realtime-Translate facilitates translation from over 70 languages into 13, suitable for customer support and educational settings.
GPT-Realtime-Whisper provides live speech-to-text capabilities, generating captions and meeting notes during discussions.
Customers testing these models comprise online real estate marketplace Zillow, travel agency Priceline, and telecommunications firm Deutsche Telekom.
Pricing for GPT-Realtime-2 starts at $32 per million audio input tokens, while GPT-Realtime-Translate is $0.034 per minute and GPT-Realtime-Whisper is $0.017 per minute.
Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents.
Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold.