Moneycontrol PRO
you are here: HomeNewsOpinion

Will ChatGPT feel like a distant blur soon? GPT-4 that creates audio and videos coming

Microsoft’s announcement that it will release GPT-4, a generative AI model with multimedia production capabilities, has electrified those who are already dazed by the searing pace of the AI race. How will Google respond?

March 15, 2023 / 08:13 AM IST
Microsoft's latest AI chatbot version can generate videos from basic text prompts.

Microsoft's latest AI chatbot version can generate videos from basic text prompts.

Advanced language models like OpenAI's GPT series and Google's LaMDA have made it possible for humans and AI to work together in new and exciting ways. For example, these models can create text that appears human written. It can help with tasks like translating languages and answering questions. But we still need to find out how much they can help people learn more about science and the unknowns out there.

We all know that GPT-3x itself was a big step forward after using ChatGPT and becoming awed by its capabilities. Compared to GPT-2, this model was more advanced and could generate paragraphs that flowed well. However, since ChatGPT's launch, talk about the "next big thing" had subsided for a while as netizens obsessed with teasing and testing its capabilities. Now that we know GPT-4 is coming, expecting something bigger is quite a rational thought, given how ChatGPT has given people a taste of AI's "rudimentary" capabilities.

Microsoft Wows Again 

Andreas Braun, the CTO of Microsoft Germany, said on March 9 that GPT-4 would be released within a week, and that it would be multimodal. Multimodality is indeed the most exciting aspect in the news trickling out about GPT-4.

"Modality" refers to the input processed by a big language model. Text, speech, pictures, and videos can all be part of a multimodal. So, for example, GPT-4 can make videos using AI from simple text prompts. This feature would allow Large Language Models to make more types of content and could even alter how video is made in various content generating industries.

GPT-3 and GPT-3.5 could only be used for text. However, according to a German news story, GPT-4 works in at least four ways: with images, sound, text, and video. Furthermore, ChatGPT, a popular conversational AI tool, is run by GPT-3.5 but can only respond within the limitations of text. But GPT-4's multimodal models could change this and make smarter and more varied content possible.

Use Cases For GPT-4

Clemens Siebler (Senior AI Specialist, Microsoft Germany) and Holger Kenn (Chief Technologist, Business Development, AI & Emerging Technologies, Microsoft Germany) shared insights on practical AI use and use cases that their teams are actively developing.

Siebler highlighted what is currently achievable: For instance, telephone calls may be recorded, and call-centre agents would no longer need to summarise and type manually. Siebler estimates that this may save a large Microsoft customer that receives 30,000 calls daily, around 500 man hours.

GPT-4 could be compatible with multiple languages: For example, the ability to receive an English question and respond in French. Moreover, researchers feel the significance of the breakthrough lies in the model's capacity to extract knowledge from across languages.

In future, GPT-4 or future iterations may even produce a textual screenplay, and throw out sound and video acting out those lines. But right now, expect bare bones functionality.

Even as new features get added, older issues of operational reliability and factual fidelity is a work in progress. Siebler of Microsoft insisted on users validating the data as AI’s correctness cannot be guaranteed. The task of training AI to respond correctly through feedback loops is, clearly, high priority for all the top AI companies across the world.

The Competition

Earlier this month, Microsoft developed a multimodal language model called Kosmos-1. Kosmos-1 is a multimodal system that uses both text and pictures. But GPT-4 goes further than Kosmos-1 because it has video and sound modes.

Kosmos-1 could be similar in functionality to Google's MUM, another multimodal AI. For example, AI enthusiasts are pointing out that Mum can answer questions in English that can in usual circumstances only be answered in one language, like say Japanese.

Google is trying to catch up with Microsoft. Unfortunately for Google, this further solidifies the perception that it has a long way to go to provide better AI for consumers.

Google already uses AI in products like Google Lens, Google Maps, etc. The idea behind this piecemeal approach was to apply AI in small tasks but this may not have been sound business strategy given Microsoft’s approach of putting AI at the front and centre of its search engine, Bing.

What has further stunned Google, perhaps, is how Microsoft has gained in prominence after its big bang AI launches. Microsoft is certainly basking in all the attention it is getting and in making Google look like it is falling behind and trying hard to catch up. But the last word has not been said. Who will have the last laugh?

Nivash Jeevanandam writes stories about the AI landscape in India and around the world, with a focus on the long-term impact on individuals and society. Views are personal and do not represent the stand of this publication.

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Nivash Jeevanandam is a senior research writer at INDIAai (Govt. of India) - National AI Portal of India | NASSCOM. Views expressed are personal.
first published: Mar 15, 2023 08:07 am