5 Sora Weakness
OpenAI has launched a text-to-video technology called Sora. The resulting video quality, with a maximum of 60 seconds, is truly amazing.

Behind all the advantages that Sora has. This AI model still has shortcomings as reported on the official OpenAI website.

The current model has weaknesses. It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark.
The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, such as following a specific camera trajectory.

And the following is an example of the shortcomings of Sora's output video.

Sora sometimes creates physically implausible motion


Step-printing scene of a person running, cinematic film shot in 35mm

Animals or people can spontaneously appear, especially in scenes containing many entities.


Five gray wolf pups frolicking and chasing each other around a remote gravel road, surrounded by grass. The pups run and leap, chasing each other, and nipping at each other, playing.

Inaccurate physical modeling and unnatural object “morphing.”


Basketball through hoop then explodes

Sora fails to model the chair as a rigid object, leading to inaccurate physical interactions.


Archeologists discover a generic plastic chair in the desert, excavating and dusting it with great care

Simulating complex interactions between objects and multiple characters is often challenging for the model, sometimes resulting in humorous generations.


A grandmother with neatly combed grey hair stands behind a colorful birthday cake with numerous candles at a wood dining room table, expression is one of pure joy and happiness, with a happy glow in her eye. She leans forward and blows out the candles with a gentle puff, the cake has pink frosting and sprinkles and the candles cease to flicker, the grandmother wears a light blue blouse adorned with floral patterns, several happy friends and family sitting at the table can be seen celebrating, out of focus. The scene is beautifully captured, cinematic, showing a 3/4 view of the grandmother and the dining room. Warm color tones and soft lighting enhance the mood.

Even though it still has shortcomings, it looks like these will be overcome in the near future with the development of AI models.

