OpenAI's GPT-5, the long-waited next-gen AI model promises to revolutionise the field. Though the tech giant has maintained stoic silence about its frontier artificial intelligence, it is being claimed that it will unify ChatGPT and the O-series to be a powerhouse that combines text, images, audio and potentially video capabilities. Speculation suggests GPT-5 will integrate all previous innovations into a single powerful system.



GPT-5 to launch in summer: What to expect?

An OpenAI executive claimed the GPT-5 large language model(LLM) is expected to witness the merger of the GPT and the o-series models, enabling a feature-rich in both advanced reasoning capability and multimodality support. Romain Huet, Head of Developer Experience at OpenAI, mentioned that the San Francisco-based AI firm is planning to build a ‘net new great frontier model’ with GPT-5. Recently, he hosted a session at the Viva Technology conference in Paris, where he hinted at GPT features. “So, the breakthrough of reasoning in the O-series and the breakthroughs in multi-modality in the GPT-series will be unified, and that will be GPT-5,” he said in a podcast



GPT-5 to have a unified interface?

Currently, ChatGPT users have different models for specific tasks, such as GPT-4o for text, DALL·E for images, or the fast GPT-4o for multimodal interaction. However, with GPT-5, the users will have a unified interface, eliminating the need for a so-called model selector. Huet said that GPT-5 will be simpler, combining the best elements of existing models. It is also expected that GPT-5 will integrate with the Operator AI agent. For the context, the Operator is a Computer-Using Agent enabled to perform various tasks autonomously on the user's device and online.



According to sources close the OpenAI, GPT-5 may have larger context windows, enabling users to have long conversations and complex tasks. The model is expected to respond in a more personalized way, to tailor responses to individual preferences. It will have full multimodal capabilities, that can seamlessly switch between text, images, speech, and possibly even video without the need for separate tools or extensions.