OpenAI enables users to speak to ChatGPT

Johannesburg, 14 May 2024

OpenAI rolls out new AI model with audio capabilities to enable users to speak to ChatGPT.

ChatGPT creator OpenAI has unveiled its latest artificial intelligence (AI) model, named GPT-4o.

Microsoft-backed OpenAI said yesterday that GPT-4o (o for omni) is a step towards much more natural human-computer interaction.

The new AI model allows users to interact through audio, images and text in real-time, says the company, adding that it can respond to audio inputs similar to human response time in a conversation.

“Prior to GPT-4o, you could use voice mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average,” says the company in a blog post.

“With GPT-4o, we trained a single new model end-to-end across text, vision and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.”

See also

OpenAI intros Sora, a new text-to-video AI model

Sam Altman reinstated as CEO of OpenAI

OpenAI initially launched text-based ChatGPT in 2022, which has the ability to interact in conversational dialogue form and provide responses that can appear human. It can also draft prose, poetry or computer code on command.

It is built on top of OpenAI’s GPT-3 family of large language models, and is fine-tuned with supervised and reinforcement learning techniques.

Since the ChatGPT launch, OpenAI has rolled out several generative AI tools, including ChatGPT Enterprise, Sora and GPT-4.

According to OpenAI, GPT-4o has undergone extensive external red teaming, with over 70 external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly-added modalities.

“We used these learnings to build out our safety interventions in order to improve the safety of interacting with GPT-4o. We will continue to mitigate new risks as they’re discovered.

“We recognise that GPT-4o’s audio modalities present a variety of novel risks. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training and safety necessary to release the other modalities.”

The company states GPT-4o’s text and image capabilities are available in ChatGPT in the free tier, as well as to Plus users.

“We'll roll out a new version of voice mode with GPT-4o in alpha within ChatGPT Plus in the coming weeks.Developers can also now access GPT-4o in the API as a text and vision model.”