OpenAI Realtime Translation API Demo
A capped browser desk for testing OpenAI's new realtime translation API with mic input, WebRTC, pause and continue cues, and visible latency checks.
What this proves
Realtime voice demos only matter if the latency, interruption behavior, and deployment path survive a real mic test.
How it works
What This Proves
OpenAI's dedicated realtime translation model is the release signal, but the practical ShipWithTez question is different:
Can a small team use existing credits and a normal cloud account to test the workflow today?
This build answers that in two passes. First, I tried Azure OpenAI because the Microsoft credits were already available. Microsoft says gpt-realtime-2, gpt-realtime-translate, and gpt-realtime-whisper are rolling into Foundry, and the Azure deployment API recognizes the 2026-05-06 model names.
The catch: this Visual Studio subscription still returns SpecialFeatureOrQuotaIdRequired for those models. So the working fallback is gpt-realtime-1.5 until quota or feature access opens up.
The second pass uses direct OpenAI API access for the dedicated translation model, but with a strict build budget. The app reserves the full worst-case session cost before it mints a short-lived client secret, defaults to a $20 cap, over-reserves at $0.10 per minute, and auto-stops browser sessions after the configured session window. For a true provider-enforced ceiling, the OpenAI project budget should also be set to $20.
What I Built
The desk has three layers:
- a server route that can mint Azure realtime secrets or OpenAI realtime translation secrets
- a local OpenAI budget ledger that refuses to mint once the
$20reservation cap is exhausted - a browser WebRTC client that sends microphone audio directly to the realtime model
- a test surface for target language, pause behavior, latency, transcripts, and event logs
The server route is gated by an env flag so the public page does not mint tokens by accident. On this Mac it can use Azure CLI auth for Azure, or OPENAI_API_KEY for direct OpenAI without exposing permanent keys to the browser.
The deployment test is part of the artifact: Azure accepts gpt-realtime-1.5, but the newest Azure-hosted realtime models currently require additional subscription access. Direct OpenAI is the fastest path to test gpt-realtime-translate today.
Why It Fits ShipWithTez
The post is not "look at the new API."
The useful story is:
I tried to turn the announcement into a working operator demo with the credits I already had. Here is what worked, what Azure exposes today, and where access still blocks the ideal version.
That is much stronger for SWT users than another stack walkthrough because it shows the real adoption path:
- use official cloud credits
- deploy the closest available realtime model
- switch to OpenAI direct when cloud quota is blocked
- cap the experiment before it can turn into surprise spend
- test latency with human speech
- verify interruption and resume behavior
- call out the quota/access block instead of hiding it
What I Would Add Next
- Set the OpenAI project budget to
$20, then run one real mic test throughgpt-realtime-translate. - Swap the Azure deployment to
gpt-realtime-translatewhen this subscription gets access. - Add a saved latency run with first-token, first-audio, and pause-resume timings.
- Add a browser tab audio mode for translating webinars or support calls.
- Package the Azure CLI setup as a short operator runbook.
- Turn the best run into the LinkedIn post and Instagram carousel.
Get new builds, breakdowns, and useful AI updates.