Moneycontrol PRO
HomeTechnologyAlibab's Qwen3-Omni aims to outperform Google’s Nano Banana and OpenAI's GPT-4o

Alibab's Qwen3-Omni aims to outperform Google’s Nano Banana and OpenAI's GPT-4o

Developed under Alibaba Cloud’s Qwen team, Qwen3-Omni is described as the company’s first native end-to-end multimodal platform. In benchmark tests, two of its variants reportedly outperformed GPT-4o and Gemini 2.5-Flash in tasks such as audio recognition and comprehension, as well as image and video understanding.

September 24, 2025 / 12:41 IST
Alibaba Qwen

Alibaba has introduced a new flagship artificial intelligence model, Qwen3-Omni, positioning it against OpenAI’s GPT-4o and Google’s Gemini 2.5-Flash (“Nano Banana”). The multimodal system, launched Tuesday, is designed to handle text, image, audio, and video inputs in one unified model and respond with both text and speech.

Developed under Alibaba Cloud’s Qwen team, Qwen3-Omni is described as the company’s first native end-to-end multimodal platform, according to a report by South China Morning Post. In benchmark tests, two of its variants reportedly outperformed GPT-4o and Gemini 2.5-Flash in tasks such as audio recognition and comprehension, as well as image and video understanding.

As per the report, Lin Junyang, a researcher on the project, credited the improvements to large-scale datasets and foundational work in audio processing. “This year, our audio team has spent great efforts on building large-scale audio datasets for both pretraining and post-training,” Lin was quoted as saying in the report.

The model supports inputs in 119 text languages and 19 spoken languages, including English, Chinese, Japanese, Arabic, Spanish, and Urdu. It can generate speech in 10 languages, among them English, Chinese, French, German, and Japanese. In a demonstration, Alibaba showed how devices equipped with cameras, microphones, and speakers could use Qwen3-Omni to perceive their surroundings and respond with natural-sounding speech, according to the report.

Three variants of the Qwen3-Omni series have been released on open-source platforms such as Hugging Face and GitHub. Alongside the flagship model, Alibaba also rolled out Qwen-Image-Edit-2509, an updated image-editing tool that improves consistency, and Qwen3-TTS-Flash, a proprietary text-to-speech model available exclusively through Alibaba Cloud. The latter can generate expressive, humanlike voices and adjust tone to match the input text.

The report also mentions that the announcements set the stage for Alibaba Cloud’s Apsara Conference, which kicks off Wednesday in Hangzhou. With these releases, Alibaba is signalling its intent to compete not just domestically but on the global AI stage.

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

MC Tech Desk Read the latest and trending tech news—stay updated on AI, gadgets, cybersecurity, software updates, smartphones, blockchain, space tech, and the future of innovation.
first published: Sep 24, 2025 12:40 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

  • On Saturdays

    Find the best of Al News in one place, specially curated for you every weekend.

  • Daily-Weekdays

    Stay on top of the latest tech trends and biggest startup news.

Advisory Alert: It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347