OpenAI Spring Update

#61

Born Analog
May 15, 2024

Well, that was fun! OpenAI gave an update on Monday with some live demos that were pretty special. Loads to unpack, so lets dive in…

OpenAI, the company behind ChatGPT, gave an update on Monday with lots of announcements and demonstrations of new capabilities. It seems that every week something happens that is jaw dropping as teams from the main AI companies slave away to create incredible enhancements to a technology that is already amazing. And here is the thing: the AI that we have today is the absolute worst it will ever be going forward!

So, what’s new? Well, probably the most important thing is that the latest model demonstrated. ChatGPT-4o has been made available to every class of user, including free users! Absolutely no excuse for downloading the ChatGPT app and selecting 4o to get going. Please commit to trying it if you haven’t already. For paid users over the coming weeks much of what was demostrated will be gradually released.

Second thing is that the ‘o’ in ChatGPT-4o stands for ‘omni’ which effectively means that it is now (or will be when released over the next couple of weeks) completely multimodal - it can accept inputs via text, speech, video, document, photo, you name it, this model can understand it. It also means that this model can talk back to you and see what you allow it to see via your camera, and react in real time and with what sounds like real emotion!

What follows are some clips from the demos on Monday that showcase some of the capabilities of ChatGPT-4o.

Here’s voice mode in action. It seems that they have solved the annoying lag when using AI models to now have it very humanlike and realtime:

Live demo of GPT-4o realtime conversational speech
— OpenAI (@OpenAI)
9:23 PM • May 13, 2024

Next up, we have a demo of emotion and voice variation:

Live demo of GPT-4o voice variation
— OpenAI (@OpenAI)
9:23 PM • May 13, 2024

One of the coolest things this model is capable of is actually seeing and reasoning. Have a look:

Live audience request for GPT-4o vision capabilities
— OpenAI (@OpenAI)
9:23 PM • May 13, 2024

Here’s another great use case of the vision capabilities:

@BeMyEyes with GPT-4o
— OpenAI (@OpenAI)
6:39 PM • May 13, 2024

Finally, here is a link to the full event. It lasts less than 30 minutes and showcases much more than I have found clips for here. This model does tutoring, realtime translation and so much more, all of which are showcased in this demo. Enjoy!

As soon as I get access to the new features, I’ll be sure to report on them here. Until then, enjoy the new model…