First GPT-4o, now Gemini...


The day after OpenAI took the world by storm with their live demo’s - showcased in #61 of this newsletter - Google kicked off their I/O conference at which they rolled out their own massive enhancements. Things are moving so fast, it’s almost dizzying! So today, I’m going to unpack a little of what Gemini and ChatGPT-4o are and what they will impact. Let’s dive in…

Gemini, like ChatGPT-4o for OpenAI, is Google latest and best large language model (LLM). You can access GPT-4o for free here and use Gemini for free right here. This is how you can interact with them:

Unlike it’s predecessors, Gemini is now multimodal meaning that it can interact by means of text, image or even video. It’s output can also be in these forms. Right now CHatGPT-4o can do all this plus voice to voice and that is a huge differentiator. Where Gemini scores over GPT-4o though, is that it has the ability to accept massively more data in terms of it’s input - upto 2 million tokens. To put this into context, this equates to about 2 hours of video or 1.4m words! BY contrast ChatGPT-4o has a 128k context window. I guess for the majority of things that we do with AI, 128k will be more than enough but it’s certainly a good hook to get people to use it.

With these new superpower models, what are the practical things that they can impact and how? Also, is there anything we should be concerned about?

Well, here are my high level thoughts on this - it goes without saying, but the more powerful these models get, the greater the impact they will have on society. Imagine the impact on customer service? You call to query anything and you have somebody reply to you instantly, in an accent just like yours, using language just like you do, responding quickly, with empathy and accurately. How are you going to feel about that company? Yet you’ll be talking to a machine. Less than a year ago, I thought that this was likely to be 5 - 10 years away! Even more likely now seems to be to miss that human to machine step altogether and you’ll now tell your AI agent to deal with the issue for you. Check this out:

Amazing, don’t you think? And just to think: this is the worst AI will ever be going forward! Next up, what about education? There is nothing that you can’t now learn from the best tutor in the world. Gemini will have it’s voice to voice capability in the autumn apparently, whereas 4o has it now. Ask a question, get an answer. You’ll always be talking to the smartest person in the room! Check this video out:

Finally, as the models get more powerful - and OpenAI hinted at a new model (GPT-5?) being released soon - then their ability to upset the job market becomes more acute. This should greatly concern us but the more I think about it, the more I feel that it’ll be a combination of human and AI for most jobs that prevails in the short term, although certain jobs will disappear entirely. As AI learns to drive for example, in a way that is 100 times better than a human, it’s hard to imagine that most of the driving jobs today will survive the cull. Similarly, call centre jobs. I’m sure it’ll be quite a long list. We will need to, as a society, ensure that we have a plan for dealing with the [millions] of the displaced in good time. That particular manifesto though, is for another day!