Affiliate links on Android Authority may earn us a commission. Learn more.
Gemini 1.5 Pro could do for audio what previous versions did for text
- Google has announced that the Gemini 1.5 Pro model is now available for public preview.
- The company added that the upgraded AI model supports audio processing.
- Google says this tech can be used for high-quality transcriptions, analysis of earnings calls, and more.
Google’s Gemini generative AI models are divided into Nano, Pro, and Ultra. The company announced Gemini 1.5 back in February, and it’s now confirmed that Gemini 1.5 Pro is available for public preview and has gained a notable feature.
Google confirmed that Gemini 1.5 Pro now supports the processing of audio. The search giant says this support includes audio in video files and speech.
“This provides users with seamless cross-modal analysis, providing insights across text, images, videos, and audio. It also provides high-quality transcription and can be used to search audio and video content, such as using it to search, analyze, and answer questions across earnings calls or investor meetings,” Google explained.
A major upgrade for Google’s AI efforts
The company previously claimed that Gemini 1.5 Pro beat Gemini 1.0 Pro in 87% of benchmarks and was almost on par with Gemini 1.0 Ultra. It also previously stated that customers could process an hour of video, 11 hours of audio, codebases with over 30,000 lines of code, or over 700,000 words in a single stream.
It’s worth noting that the Gemini 1.5 Pro is for Workspace users rather than consumers. But it’ll eventually be accessible to consumers via the Gemini assistant and other avenues. Nevertheless, the support for audio processing opens the door to plenty of other features in the future.
Google already offers audio-related tricks on Pixel phones, such as transcription in the Recorder app (powered by older AI tech) and the Audio Magic Eraser tool. So we’re keen to see whether Gemini 1.5 Pro’s core audio capabilities will trickle down into a future on-device AI model as this could enable more advanced audio features on smartphones down the line.