OpenAI WebRTC Audio Session, now with document context

Disclosure: Some links in this article are affiliate links. AI Maestro may earn a commission if you make a purchase, at no…

By AI Maestro June 13, 2026 1 min read
OpenAI WebRTC Audio Session, now with document context

OpenAI has updated its WebRTC audio interface to allow users to select the GPT-Realtime-2 model and provide large blocks of document context for voice conversations. Simon Willison built the original tool in December 2024 to test the initial API, but the new GPT-Realtime-2 variant, described as having GPT-5-class reasoning with a September 2024 knowledge cut-off, has not yet arrived in the official ChatGPT iPhone app. Consequently, developers have returned to the public playground to demonstrate these capabilities. The updated interface now permits users to paste extensive text, such as a Markdown document comparing DuckDB and Datasette, before initiating a call. This setup enables the audio model to discuss the specific technical details within that document while maintaining a natural spoken dialogue. The screenshot shows a functional browser session where the model successfully generates a transcript regarding database security based solely on the provided input.

This development matters because it demonstrates how enterprise-grade reasoning models can be applied to specific, complex information without requiring users to rely on general knowledge. By embedding document context directly into the audio session, the technology bridges the gap between raw data and conversational analysis, allowing professionals to query detailed reports or codebases verbally. This approach reduces the friction of switching between reading screens and speaking, making it viable for workflows where hands-free interaction is necessary. It also highlights the current state of adoption, where advanced features remain accessible primarily through developer tools rather than consumer applications. The ability to lock a conversation into a specific document ensures accuracy and relevance, which is critical for technical troubleshooting or legal analysis.

* Users can now select GPT-Realtime-2 directly in the WebRTC playground despite its absence from the consumer app.
* Providing document context allows the audio model to discuss specific technical details like database security protocols.
* This feature enables hands-free analysis of complex documents without needing to switch between reading and speaking modes.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top