OpenAI Releases GPT-4o, Brings Voice Assistant Capabilities Similar to Her Movies

When Google is preparing for the annual Google I/O event which will then be followed by Apple with WWDC, yesterday (13/5) was the right time for OpenAI to release their new model, namely GPT-4o which can be accessed directly from ChatGPT. Compared to the previous model, there are many improvements presented, including a more natural sound mode.

GPT-4o itself is the latest flagship model from OpenAI, a more perfect version of GPT-4 Turbo, with faster capabilities but also lighter. Not only for those who subscribe, the GPT-4o model will also be accessible to free ChatGPT users—the company’s main goal is to make artificial intelligence accessible to more people.

Sam Altman, CEO of OpenAI, said that the initial concept when starting OpenAI was to create AI and use it to create various benefits for the world. “Instead, now it looks like we will create AI and then other people will use it to create all kinds of amazing things, which can benefit us all.”

Also read: Google Gemini Officially Launched, ChatGPT Competitor Generative AI Model!

GPT-4o Supports Much More Advanced Voice Mode

More about the presence of GPT-4o, Sam said via his personal blog Apart from being available to all users, other significant improvements are new audio (sound) and video modes with a very easy interface, as well as an experience similar to the advanced voice assistant in the film Her (2013). This was proven directly in the live GPT-4o introduction session.

Current voice assistants, including Google Assistant and Siri, require voice input one by one and alternately, with a delay of several seconds which is still less natural. In the demo shown above, ChatGPT wants to prove that GPT-4o can provide a more natural chat style.

Now users no longer need to wait for the AI model to finish speaking, i.e. they can interrupt and user input will be heard and processed in real-time, as well as combined photo and video input. In fact, the response given only has a delay of around 320 milliseconds, equivalent to the response speed between humans.

GPT-4o utilizes a new model that supports end-to-end text, camera and audio input within the same neural network, so it is now able to support more than one different voice input, providing and detecting emotional expressions. This is useful, including when used as a real-time translator between two speakers in their respective languages.

Another example of using GPT-4o is by showing a sports match via a video camera, and asking ChatGPT to explain the sports rules in real-time. Will be available to all users, ChatGPT Plus members get the benefit of accessing it early.

Will be available to all users, ChatGPT Plus customers can access it first

The text and image-based capabilities of GPT-4o can now be accessed via ChatGPT, including free users and Plus members with a five times larger benefit limit. The newest Voice Mode is ready to arrive in alpha form for Plus users in the next few weeks.

In addition, for developers, the API for GPT-4o is claimed to be twice as fast, 50% cheaper, with a limit five times higher than the previous GPT-4 TUrbo. ChatGPT with GPT-4o is also starting to be presented as a desktop application, arriving first for macOS with an easy shortcut via the Option + Space command on the keyboard.

ChatGPT Desktop for macOS is available to Plus users today, and will be rolling out to more users in the coming weeks. Meanwhile, for Windows PC, it will be launched this year. Not only the application, the ChatGPT display has also been made simpler on all platforms.

The Indonesian version of this article can be read in Gizmologi.ID