OpenAI Sora Stunned the AI World

OpenAI’s first AI video model, SORA, has stunned the world, effectively killing the whole video content industry.

OpenAI’s first AI video model has stunned the world

OpenAI has just released its first AI video model, Sora, which generates stunning video in 60 seconds with a single shot. Netizens are totally amazed by this revolution in AI video. This AI model understands and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.

SORA can effectively killing the whole video content industry

Sora is superb. It not only create scenes that are both realistic and imaginative based on text prompt, but also able to generate videos up to 1 minute long in a single shot. This is simply amazing.

While Runway Gen 2, Pika, and other AI video tools are still struggling to break through the video generation of more than a few seconds, OpenAI has already scored an epic achievement.

Sora Capability

Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.

The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately persist characters and visual style.

Examples of Video generated by Sora

In the 60-second video on OpenAI website, the girl and background characters in the video all achieved astonishing consistency. All the characters maintain superb consistency even the cameras are freely switched.

Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.

How did OpenAI do it? According to OpenAI official website, “By giving the model foresight of many frames at a time, we’ve solved a challenging problem of making sure a subject stays the same even when it goes out of view temporarily.”

Sora Makes Breakthrough in Various Technologies

With a deep understanding of natural language, Sora can accurately understand the requirement expressed in user instructions and grasp the way these elements are expressed in the real physical world. As such, the characters created by Sora will have rich emotional expression.

In the generated complex video scenes, you will notice that many characters and objects are being created, each with some very specific set of actions. Moreover, you can also find that Sora has successfully reproduce the objects as well as the background’s detailing accurately.

Look at the pupils, eyelashes, and skin texture of the characters in the video below, there are so real until you can hardly find any trace of artificial intelligence.

So guys, what is the difference between virtual object in the video and real world object?

Prompt: Extreme close up of a 24 year old woman’s eye blinking, standing in Marrakech during magic hour, cinematic film shot in 70mm, depth of field, vivid colors, cinematic

In addition, Sora can generate few different camera angle shots in the same video while maintaining the consistency of characters and styles.

Prior to Sora, all AI videos are generated using a single camera angle POV.

Therefore, it is really unbelievable that Sora is capable to produce video with different shot angles withing a single video that looks real! This is something that Gen 2 and Pika cannot achieve at all…

Prompt: A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.

For example, look at the video below.

With the text prompt, Sora is producing a winter scene in a bustling Tokyo street from a drone’s POV. You can hardly tell this video is produced by an AI tool, because it is so real.

The drone camera track a couple who walk leisurely on the Tokyo street, with the noise/sound of vehicles that running on the riverside road on the left and the sight of customers shuttling between a row of small shops on the right.

Prompt: Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.

From here, you can see that Sora has advanced to a terrifying stage, completely breaking away from the era of virtual video with strong AI trace. There is no other AI video tool that can match Sora’s capability now.

Is virtual world model coming true?

One question that render in my mind now is whether the theoretical virtual world model that mimic our physical world really exists now?

It is terrifying to see what Sora can do now in recreating the real world with just a Machine Learning algorithm. Sora has successfully learned many physical laws that exist in our real world, although it is not 100% accurate now.

Observe the video below. Look at the dogs in the following video. Sora recreates the characters (dogs and snow) perfectly with the correct dogs action (playing) and the snow action that splash and then drop back due to gravitation in real world.

Prompt: A litter of golden retriever puppies playing in the snow. Their heads pop out of the snow, covered in.

Another example which is created by Sora as well with the following prompt: “Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle.”

Prompt: Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. The art style is 3D and realistic, with a focus on lighting and texture. The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.

This is simply impressive!

With the prompt, Sora created a creature similar to a Pixar work, which seems to have the DNA of Furby, Gremlin, and Sully from “Monsters, Inc.”

What is shocking is that Sora’s understanding of the physical properties of fur texture is so accurate that it is jaw-dropping!

Back then, when “Monsters, Inc.” was released, Pixar had to spend a lot of effort to create super-complex fur textures when the monsters moves, and the technical team has to work hard for easily several months.

However, Sora achieved this with a piece of cake, no one had taught it before!

“It has learned about 3D geometry and objects’ consistency” the project research scientist Tim Brooks said.

“This is not what we set in advance – it learned naturally by observing a large amount of data.”

Sora can kill the film making industry?

Thanks to the diffusion model used by DALL·E 3 and the Transformer engine of GPT-4, Sora can not only generate videos that meet user’s specific requirements, but also showcase it’s ability to understand and implement the film shooting methodology and workflow.

This ability is reflected in its unique storytelling ability.

For example, in a video with the theme of “A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures”, project researcher Bill Peebles pointed out that Sora successfully advanced the story through its camera angles and video timeline.

Prompt: A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures.

“Actually, there were multiple camera changes in the video-these shots were not stitched together in post-production, but were generated by the model in one go,” he explained. “We didn’t specifically instruct it to do this, but it can do it automatically.” he said.

Fortunately, Sora is not perfect yet!

However, the current model is not perfect. It may encounter problems in simulating the physical effects of complex scenes, and may also have difficulty in accurately understanding the cause and effect relationships in specific situations. For example, after someone eats a part of a cookie, the cookie may still appear intact.

Prompt: Basketball through hoop then explodes.

In addition, the model may make mistakes in handling spatial details, such as distinguishing between left and right, and may also perform inaccurately when describing events that change over time, such as specific camera motion trajectories.

Prompt: Five gray wolf pups frolicking and chasing each other around a remote gravel road, surrounded by grass. The pups run and leap, chasing each other, and nipping at each other, playing.

Prompt: Archeologists discover a generic plastic chair in the desert, excavating and dusting it with great care.

Fortunately, it is not perfect yet. Isn’t it?

Otherwise, can the boundary between virtual and reality still be distinguished?

Look at the following video, can you tell is it real?

Prompt: The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from it’s tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.

But the undeniable fact is that the terrible reality is already in front of us: a model that can understand and simulate our physical real world means that AGI is not far away.

In a few word: SORA is a revolution!

**Videos in this articles are courtesy from OpenAI website.