Sora from OpenAI - which profession will it destroy

We found out at what stage of development Sora from OpenAI is now – a neural network that generates videos based on text descriptions. They asked if she could destroy at least one profession, because this is what was expected from ChatGPT at one time. Senior and middle experts answered the questions.

***

We remind you that you can ask your question to the experts, and we will collect answers to it if it turns out to be interesting. Questions that have already been asked can be found in the list of issues in the section.

If you want to join the ranks of experts and send an answer from your company or yourself, then write to experts@tproger.ru, we will tell you how to do this.

***Savely Baturin

Head of Machine Learning, MEN IN DEV LLC

There have already been a lot of thoughts on the Internet about the nature of Sora, the mechanisms of its work and possible consequences for various industries, such as cinema, video blogging, GameDev and even robotics.

For example, I, as a person who is familiar first-hand with the task of localizing a robot in space, was very impressed by the quality of the worlds recreated by Sora. After all, the video that we see as a result is only a reflection of a hidden representation lying deep in the neural network. This idea could be very useful when building AI systems in which an agent, say the same robot or quadcopter, will interact with the observed world, performing tasks assigned to it. Apparently, Sora, having spent hundreds of virtual years watching videos, understood how 3D works and some of the physical laws inherent in our world. Of course, these are just thoughts on the topic and the preliminary display of results from OpenAI does not give us the full picture.

It is also worth considering that according to indirect data, generating a one-minute video takes about an hour! Such timing does not in any way contribute to the rapid testing of hypotheses, the development of a number of prompts, and the effective generation of the required result. In this regard, it is unlikely to harm the same directors, video bloggers, designers and other people involved in the creation of visual content. On the contrary, footage made by Sora could fit very well into some arthouse, historical or adventure film. Marketers will be able to generate video for advertising campaigns as effectively as they now do with static images from Midjourney. The world does not stand still, new tools appear, professions are transformed, but humans still remain on the pedestal – this is probably the main agenda of many modern conferences and meetups on the topic of AI. The appearance of Sora is only for the better, do not be afraid of progress, but adapt to it!

Nikita Dubinkin

Head of online service for working with media PREX.RU

It’s still quite difficult to evaluate OpenAI’s Sora tool: we can’t test it ourselves, we can only rely on demo videos of the best final result shown by the company. From them it is not at all clear to us how consistently such results are obtained, how many resources need to be spent on them, how much such generation will cost, etc. Also, we know nothing about the “purity of the experiment” and the reliability of the result, and there have already been precedents for “twisting” them – as, for example, in the case of the demo of AGI Gemini capabilities, where all the hints and leading questions in the promts were left behind the scenes.

When it comes to which professions Sora will harm, the first ones that come to mind are those involved in the production of stock video content – just like in the case of photographs. That is, people who have invested in professional equipment, produce in-demand mid-level content for stock and make money from it – their videos will no longer be bought. It can be predicted that with the development of neural networks, all photo and video stocks will eventually lose their relevance, and something niche and professional will remain – as happened, for example, with analog photography.

Serious changes can affect the entertainment industry as a whole, it can change beyond recognition. We recently followed a strike by Hollywood actors and screenwriters who demanded their work be protected from AI. The development of tools like Sora can also have a painful impact on the artistic and technical department, because the AI video generator is capable of “cancelling” cameramen, lighting designers, editors, location managers, decorators, costume designers, make-up artists… In short, a film caravan of a dozen cars with equipment and staff in the future may turn into one person at a computer – and all the consequences of this for the multi-billion dollar market and people are now difficult to even imagine and evaluate.

Sora is not profitable for today's video industry participants, but it will make life much easier for commercial consumers of their services, who will have to spend fewer resources on contractors. For example, the cost of producing advertising creatives may decrease, which will help small businesses compete. At the same time, the emergence of a video generation tool will expand the expressive capabilities of creators: independent film adaptations of their own literary works, for which they do not have to look for million-dollar budgets, are the first thing that comes to mind.

But all this is still far away. Because even the impressive results that Sora shows are not so impressive when examined in detail. We still see extra fingers, legs switch places, adjacent bodies stick together, and so on. After all, generative neural networks do not understand anything about physics or anatomy, they simply select results based on this. That is, they were trained on millions of videos with cats, and can generate a “collective image” of a cat. But what a cat is, how it is “structured”, how it moves, neural networks do not know, this is not built into them.

Of course, when compared with previously presented AI tools for video generation, the content is an order of magnitude higher quality. But Sora, for all its power, is not yet good enough for commercial professional use. It's one thing for creative works, where generation glitches are more a feature than a bug. It’s completely different when we need pure original video content with specific images and content for a specific task: here the current level of Sora’s development is not yet enough. Therefore, we will have to wait another couple of years before the appearance of a professional tool for commercial use. And it probably won’t cost $20 a month, because generating video content requires a lot of resources.

How will the neural network evolve? Here we can draw an analogy with Midjourney. The accuracy of request processing, video quality will be improved, bugs will be eliminated – but how long such development will require is difficult to say. At the same time, customization and adjustment of the tool to solve typical problems will take place, because not everyone needs the full breadth of creative possibilities. OpenAI's intention to give beta access to Sora only to individual creators and, together with them, to find out in practice what people need from Sora, just means that the work of adapting the neural network to the requests and needs of specific professionals and industries who are willing to pay for it use has just begun.

Dmitry Zolototrubov

Specialist in the field of video/film production and neural networks

Following the boom of the Midjourney and Chat GPT neural networks, Open AI has released a demo of the new Sora video generation service. The network is currently undergoing testing and is closed to users. However, now you can look at the results of her work, which look stunning.

The emergence of Midjourney and Chat GPT affects the labor market for illustrators and editors, but so far… positively. The use of neural networks in production will not deprive specialists of their jobs (especially creative ones), but will significantly change pipelines.

In the practice of video production (name of me), we use two grids for working with video: Midjourney and WonderDynamics. Midjourney generates a content graph, which is then cut into layers and animated. WonderDynamics allows you to superimpose a 3D character on top of a real actor, “overwriting” the latter. It would seem – here it is, a replacement for people! No. It’s just that now specialists are forced to expand their competencies: in mijorni, the illustrator now not only draws on a tablet, but also completes the drawings behind the grid, directing her work in the right direction, using narrow-profile terminology in the office (inaccessible to the secretary). Also with WonderDynamics: we use the services of the same usual 3D specialists: riggers are responsible for the “bones”, animators finalize the scenes in Blender, composers rotoscope and compose behind the mesh (and sometimes manually).

Networks do not work perfectly (no matter how rosy advertising materials the manufacturers produce), so specialists have no less work, but even more work because the number of requests for content has increased, making it more accessible to businesses. Testing, adaptation, correction – the lion's share of work with meshes.

As for the new Sora product, its implementation will be painful and require tons of corrections. The cost of use will also be high because Its demands on power resources are titanic. A certain limit in computing power has been reached, but a quantum computer still remains a spherical horse.

No network can replace the work of a talented master like Miyazaki. But it will help significantly speed up the studio’s pipelines. And, sometimes, it’s easier to pick up a camera, stage actors and shoot material than to organize a technologically more complex (and still expensive) generation. In the long term, everything will, of course, end with Ubik (the novel by F.K. Dick, which influenced the creation of the films The Matrix, Inception, 13th Floor – author's note). But that is another story.

Sergey Snegirev

Head of Game and Application Development Department at Dobro Games, author of the podcast “80 Levels of Game Design”.

Sora from OpenAI is like Midjourney from a year ago for images. This is a huge advance in video generation based on text description. Neural networks for video generation before Sora were at a fairly low level, in some places we could talk about interesting stylization with huge time costs and manual drawing (for a high-quality example of such generation, you can take the Lost video from Linkin Park from February 2023), but There was no talk of any significant progress. Sora, on the other hand, is capable of generating detailed and realistic videos, complex scenes, multiple characters, different types of lighting, etc. And all this in high definition. This is a coming revolution for video production.

The technology has no less potential than today's DALLE and Midjourney in the field of image generation. It is too early to talk about any specific threats, because the generation of such videos is quite resource-intensive and is currently only capable of producing videos up to a minute long. But it is potentially an ideal tool for use primarily in the film industry and advertising. This means that camera operators, directors, and quite possibly even screenwriters will have to learn new skills.

From the point of view of a game designer and game developer, I see great opportunities here for the gaming industry. Sora is now capable of simulating game worlds, which means that, in theory, in the near future we could have a system for generating full-fledged game spaces using text-based prompts. If something like this becomes a reality, then within 5 years the professions of level designers, modelers and developers in general will be modernized.

***

If you want to join the ranks of experts and send an answer from your company or yourself, then write to experts@tproger.ru, we will tell you how to do this.