It’s been quite a summer for OpenAI. First, pushing the artificial intelligence (AI) capabilities’ envelope with the new GPT-4o model. Voice conversations, browsing the web to answer your queries, writing code, analysing data charts, text to image, troubleshooting with just a photo are all on the agenda. And that wasn’t even it. You can converse with the assistant via the phone’s camera, giving AI more of a context about the world around you. All this was even before the announcement at Apple WWDC, where we saw the first glimpses of ChatGPT finding a home within iOS, iPadOS, and macOS. OpenAI isn’t done for the summer.
The GPT-4o mini (its mini, not Mini) is keeping the size and weight in check, as is often the case with AI models and how they’re used, for broader relevance. Even then, OpenAI insists the cost-effective small model is better than even GPT-4 in many tasks, whilst being significantly less costly as well. Competitors too, on the MMLU and MMMU benchmarks. For instance, in the MMLU benchmark that tests multimodal reasoning, GPT-4o mini scores an 82% accuracy rate, while Gemini Flash (77.9%) and Claude Haiku (73.9%) are far behind.
“Today, GPT-4o mini supports text and vision in the API, with support for text, image, video and audio inputs and outputs coming in the future. The model has a context window of 128K tokens, supports up to 16K output tokens per request, and has knowledge up to October 2023. Thanks to the improved tokenizer shared with GPT-4o, handling non-English text is now even more cost effective,” the company says in a statement shared with HT.
The pricing they’re indicating for developers is 15 cents per million input tokens and 60 cents per million output tokens – this is, according to official numbers, 60% less costly than the GPT-3.5 Turbo. In due course, GPT-4o mini in the API will have full multimodal support like GPT-4o, supporting text, image, video, and audio inputs and outputs.
OpenAI had to tackle the small model class effectively, and largely two reasons. First was the pricing, to stay competitive. The way things are brewing for developers requiring a more cost-effective AI model to integrate within their apps or platforms, they’d have an option to choose from Google Gemini 1.5 Flash or Anthropic’s Claude 3 Haiku. The Claude 3 Haiku, for example, is priced at 25 cents per million tokens input and 1.25 cents per million tokens output.
Secondly, the need for different sized models, for flexibility with size, capabilities, performance, utility and efficiency. Google has the Gemini in Ultra, Pro and Flash model sizes. Anthropic has the Claude 3.5 in Sonnet, Opus and Haiku sizes. Gemini Nano’s on-device specifics may not be relevant to tackle in OpenAI’s case, at least till the iOS, iPadOS and macOS integration comes through.
For now, GPT-4o mini is available as a text and vision model in the Assistants API, Chat Completions API, and Batch API for developers to integrate. OpenAI confirms to HT that the plan is to roll out fine-tuning for GPT-4o mini in the coming days. For everyone else, access to GPT-4o mini is now available for Free, Plus and Team users, while Enterprise subscribers get access in the coming next week.
For OpenAI, and indeed the entire AI space, this hasn’t been a summer to forget. The subplots played their part in the intrigue.
Google talked about an updated Gemini 1.5 Pro model, along with updates to Gemini Nano, and Gemini Live (in early May) which will allow AI and you to see the world through a phone’s camera. OpenAI had an advance response to everything Google announced, by unleashing the new GPT-4o model, as well as real-world views for context. An example of that being OpenAI’s partnership with Be My Eyes, which has upgraded from GPT-4 to GPT-4o as the foundation for guidance for users with visual impairments. Then came the big announcement with Apple.
The fact that Google responded to OpenAI’s GPT-4o’s ability to see the world through the phone camera with Gemini Live, within 24 hours, details just how little margin there is to get any of this wrong. And how competitive the space is. None of that is likely to change, anytime soon.
Source: OpenAI’s GPT-4o mini model is battling Gemini Flash and Claude Haiku