how differently they work with video

accepting applications The September conference is still open.


Nostalgia (and codec/preset choice)

Let's start from afar: remember how people bought pirated CDs (not even DVDs) with films of dubious quality, where the video sequence seemed to be trampled into 650 megabytes? Those who saw Neo's face turn into a pixelated mess in 2000 understood the trade-off of “file size versus video quality”. And it became clear back then how important a good codec is.

Decades have passed, videos are watched online, disks are forgotten. However, neither the important trade-off nor the question of choosing a codec have gone anywhere. Moreover, experts are concerned not only with “which codec to use”: there is also the question of “with which preset”.

After all, the “default” settings are made universal, so that it is possible to encode well either a football match or a music video. But what if your project is a lecture hall, where neither the speed of football nor the colorfulness of clips is expected? By choosing a suitable preset, you can noticeably win in file size without losing quality.

This is exactly what happened in the story reported on VideoTech.

Online cinemas (and what happened to presets online)

It would seem, well, what are the “big differences” then, if people have been butting heads with the same trade-off for decades?

However, the transition to online has added a new dimension. Previously, you could encode a film once and burn a million identical discs. Now one user is sitting in front of a big TV, another wants to watch a series in the forest, and a third is riding in public transport. The first one wants 4K, the second one is ready to tolerate low resolution, and the third one’s connection speed jumps every minute and you have to adjust.

So it's no longer enough to just “select a codec/preset”: you now need a whole range of different quality options, and for everything to switch between them automatically on the fly.

There are many nuances along the way. Imagine this situation: your online cinema seems to be working well, but in the Far East people are not very happy. You investigate and discover that due to a slow Internet connection many people find themselves in this situation: the lowest quality option looks really sad, and at the second quality level the bitrate is much higher, and users often do not reach it. This begs the question: for example, if you make a “one and a half” preset between the first and second, can it suit many users and improve their lives?

Can you imagine the situation? And Dmitry Piskunov from KION doesn't even need to be imagined, he told us about this:

And this is something that film publishers on discs didn't have to think about. The difference in angles had already begun.

Livestreaming (and the synchronization challenge)

The previous examples were about pre-recorded content like movies. But the development of video technology has also brought us online broadcasts. Now anyone can open Twitch and show the world how they enthusiastically “tap a hamster”. But does this someone understand how much technology is involved in order for their tapping to be well-distributed?

Live streaming brings a lot of new challenges. And here's an example of just one of them. Imagine that you transmit not only a video stream, but also some important metadata related to what's happening in it. Well, for example, you broadcast a sports match, and you have a specialized solution that separately sends the current score.

In the world of pre-recorded video, there is also additional data, the simplest example being subtitles for films. But they can be properly timed in advance. But with a football match, you transmit both video and other information “immediately” by default. And it can reach the user at different speeds. And it will turn out that he hasn't even seen the goal yet, but the score has already changed. Who would like such spoilers?

It turns out that you need to be able to synchronize metadata with video footage. And there was a report about this too:

Private Livestreaming (and Open Source)

When it comes to video technologies, it’s easy to think that only giant companies do all the interesting things there. Like, if you have millions of users, you can do something noticeable at that scale — for example, an open source project that others will then use. But if you’re a loner, you just use these other people’s projects and that’s it.

But the world is much more diverse. A live video stream is not necessarily a “loud presentation watched around the world.” It can be, for example, a “home camera that streams to a single person — its owner.”

And open source in the world of video technology can be more than just “big corporate.” A lone wolf with a “if something is missing, do it yourself” approach can find a point of application like this:

Video communication (and on-the-fly video stream processing)

Let's say we're making a video call. It would seem, well, what would be different if we've already talked about online broadcasts, and here there's just more than one at a time? How is the work different if here we also need to “provide the highest possible quality under current conditions”?

Well, for starters, people here don't understand the words “high quality” in the same way as they do with movies or streamers. They care less about the resolution of the picture, but they care more about the delay. If the guy from Salekhard suddenly became “pixelated” on the screen, it's bearable: he's not being filmed by a camera as good as Keanu Reeves' anyway. But if the sound is interrupted or people interrupt each other because of the delay, it's simply inconvenient to have a dialogue, and that's much more unpleasant.

And the user here not only consumes video, but also generates it. And since he generates it, technology can somehow intervene in this process to improve the result.

For example, many people want to “blur” the background during phone calls so that scattered socks are less noticeable. And we also had a report about the task of replacing the background on the go:

This is a problem you don't encounter in online cinemas: Keanu Reeves always has his socks in order.

Various applications of ML (e.g. video generation)

Now many people are talking about generative AI. Everyone already understands about creating images (like Stable Diffusion), now they are discussing video generation with tools like Sora. So that a person simply writes a text prompt, and in response he is given almost a movie.

At VideoTech, you could see “Txt2Vid” in the title of one of the talks back in 2021 — before any Stable Diffusion. So to speak, they covered the topic before it became fashionable!

However, what is even more interesting is that the report did not consider “video generation on prompt”, but a slightly different idea. Right now, video calls require a lot of traffic and don’t work well on a bad connection. What if we digitize the interlocutors and transmit only their words as text, and their models will then voice it with their voice and facial expressions?

In 2021, it sounded futuristic. And in 2024, although such video calls have not yet conquered the world, a similar thing has actually begun to be used: when a video is translated into another language using ML, making the speaker pronounce the text in another language (which he does not actually even know) in his own voice.

Computer vision

It would seem that this is also related to machine learning, but in a completely different direction: not to generate artificial video, but on the contrary, to “understand” the real world. What needs to be done so that, for example, a robot can count goods in a warehouse?

At last year's VideoTech, speaker Georgy Nikandrov, who works on unmanned vehicles at Yandex, talked about using lidars in them. And there, questions were considered that were very far from “what video presets are needed in the Far East.” In this report, words like “reflection coefficient” were not used, but rather “bitrate.” It turns out that thanks to it, lidar is able to recognize road markings.

We are gradually opening up video recordings of the reports, and we haven’t gotten to that point yet, but over time it will be on YouTube channel VideoTech, like the rest.

Working with another option? Tell us!

Of course, this is far from all. For example, 3D or VR are separate large worlds that we have little contact with. And here the very same “fix the printer” effect arises. Yes, we have been working with video technologies for many years. But they are so different that some stories are very far from us, and we understand little about them.

However, if we don't know something, we are interested in finding out, including with the help of the Habra community. And we don't want to reduce our conference to one circle of topics, it is interesting to expand it.

So, if you work with video technologies, but did not recognize yourself in the examples listed above – firstly, tell us in the comments. And secondly, you can submit an application to us to give a report.

We are cooking at full speed VideoTech 2024the conference will be held in September. You can participate online (of course, such a conference must have an online broadcast), or you can participate in person in St. Petersburg (after all, live communication has not yet been completely replaced by any video technologies).

The site already has descriptions of the first reports, and you can also see the differences from them: from “teach the player to switch between CDN nodes” before “multimodal AI systems“But perhaps it is precisely yours that is missing for now.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *