Moneycontrol PRO
HomeNewsTechnologyWhat is Sora and how does it work? A guide to OpenAI’s latest text-to-video AI tool

What is Sora and how does it work? A guide to OpenAI’s latest text-to-video AI tool

Here is everything you need to know about OpenAI's text-to-video AI tool Sora

February 16, 2024 / 13:52 IST
Sora can create videos of up to 60 seconds featuring highly detailed scenes and complex camera motion.

In a bid to stay ahead of industry rivals, Microsoft-backed OpenAI has unveiled its latest innovation — a cutting-edge text-to-video model called Sora.

This move signals OpenAI's commitment to maintaining a competitive edge in the rapidly evolving field of artificial intelligence (AI) amid a landscape where text-to-video tools have become increasingly prominent.

What is Sora?

Sora, which means sky in Japanese, is a text-to-video diffusion model capable of creating minute-long videos that are hard to tell from the real thing.

Tokyo-SOra

"Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions," OpenAI said in a post on the X platform, formerly Twitter).

The company claims that the new model can generate realistic videos using still images or existing footage provided by the user.

"We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction," the blog post said.

How can you try it?

Most of us will have to wait before getting our hands on the new AI model. Though the company announced the text-to-video model on February 15, it is still in the red-teaming phase.

Red teaming is a practice in which a team of experts, known as the red team, simulates real-world use to identify vulnerabilities and weaknesses in the system.

"We are also granting access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals," the company said.

The company has, however, shared multiple demos in the blog post, with OpenAI's CEO sharing videos of prompts requested by users on X.

How does it work?

Imagine starting with a static on a TV, noisy picture and slowly removing the fuzziness until you see a clear, moving video. That's basically what Sora does. It's a special program that uses "transformer architecture" to gradually remove the noise and create videos.

static

It can generate entire videos at once, not just frame by frame. By feeding the model text descriptions, users can guide the video's content like making sure a person stays visible even if they move off-screen for a moment.

Think of GPT models that generate text based on words. Sora does something similar, but with images and videos. It breaks down videos into smaller pieces called patches.

patches

"Sora builds on past research in DALL·E and GPT models. It uses the recaptioning technique from DALL·E 3, which involves generating highly descriptive captions for the visual training data. As a result, the model is able to follow the user’s text instructions in the generated video more faithfully," the company said in the blog post.

However, the company has not provided any details on what kind of data the model is trained on.

The model has ‘weaknesses’

The company in the blog post acknowledged that the current model has "weaknesses".

It said the model may face challenges in "accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect".

For example, a person might take a bite out of a cookie but afterwards, the cookie may not have a bite mark.

jogger

It added that the model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Arun Padmanabhan
first published: Feb 16, 2024 11:35 am

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

  • On Saturdays

    Find the best of Al News in one place, specially curated for you every weekend.

  • Daily-Weekdays

    Stay on top of the latest tech trends and biggest startup news.

Advisory Alert: It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347