ITMO Research_ podcast: how to approach the synchronization of AR-content with the show on the scale of the whole stadium

This is the first part of the text transcript of the second interview for our program (Apple podcasts, Yandex.Music) Guest Release – Andrey Karsakov (kapc3d), Ph.D., senior researcher at the National Center for Cognitive Development, associate professor of the Department of Digital Transformations.

Since 2012, Andrey has been working in the scientific group Visualization and Computer Graphics. He is engaged in large-scale applied projects at the state and international level. In this part of the conversation, we talk about his experience of AR-accompaniment of mass events.


Photo ThisisEngineering RAEng (Unsplash.com)


Project context and objectives

Timecode (by audio versions) – 00:41


dmitrykabanov: I would like to start with the European Games project. It is multi-component, several teams participated in the preparation, and providing an augmented reality for a multi-thousand audience right during the event at the stadium is a rather serious task. In terms of your participation, was this software in the first place?

kapc3d: Yes, we made the software part and provided accompaniment during the show. It was necessary to monitor, monitor and launch everything in real time, and also to work with a television group. If we consider this project as a whole, then we can talk about the opening and closing ceremonies European games in Minsk, as well as about the opening ceremony of the championship Worldskills in Kazan. It was the same scheme of work, but different activities. Between them was a gap of two months. We prepared the project together with the guys from the company Sechenov.com.

Met them by accident on Science festwhich took place in the fall of 2018. Our undergraduates showed their course project on the topic of VR. The guys approached us and asked what we were doing in our laboratory. It looked something like this:

– So you are working with VR, but are you able to work with augmented reality?

“Well, sort of, yes.”

– There is such a task, with such introductory ones. Can you do it?

The turnips were scratched a little, there seems to be nothing unrealistic:

– Let’s try to study everything in advance, and then we will find a solution.

Dmitry: Do they only deal with media support?

Andrew: Make a full stack. From the point of view of management and organization – they are fully engaged in directing, staging, selection of scenery, logistics and other technical support. But they wanted to do something special for the European Games. These special effects, such as mixed reality, have been doing for television for a long time, but they are not the most budgetary in terms of technical implementation. Therefore, the guys were looking for alternatives.

Dmitry: Let’s discuss the problem in more detail. What was she like?

Andrew: There is an event. It lasts an hour and a half. We need to make sure that the audience who watches it live and those who are sitting in the stadium can see the effects with augmented reality with full synchronization with the live show in time and location on the site.

There were a number of technical limitations. It was impossible to do time synchronization via the Internet, because there were fears about the excessive load on the network with full stands and the prospect of attending the event by the heads of state, because of which mobile networks could jam.

Andrey Karsakov, photo from ITMO University Material
We had two key components of this project – the personal experience that people can get through mobile devices, and what goes into the television broadcast and information screens at the stadium itself.

If suddenly a person watches episodes of augmented reality through a mobile device and simultaneously hits the screen, he should see the same picture.

We needed two actually different systems to completely synchronize in time. But the peculiarity of such shows is that they are complex events, where a large number of technical services are involved and all operations are performed according to time codes. A time code is a specific point in time at which something starts: light, sound, people exit, opening stage petals, and so on. We had to adapt to this system so that everything would start at the right moment. Another feature was that scenes and episodes with augmented reality were scenically tied together.

Dmitry: But you still decided to abandon the use of time codes, because of the high risks of force majeure, or you initially calculated some power characteristics and realized that the load on the entire system would be quite high?

Andrew: If you make a synchronization service for such an audience, then it is not very difficult. Requests in any case will not fall at one time. Yes, the load is high, but this is not an emergency. The question is whether it is worth spending resources and time on it if the network is suddenly extinguished. We were not sure that this would not happen. Ultimately, everything worked, intermittently due to the load, but it worked, and we synchronized using the time code in a different way. It was one of the global challenges.


UX implementation challenges

Timecode (by audio versions) – 10:42


Andrew: We also had to consider that the stadium is not a classical concert venue, and synchronize the systems in space for mobile devices. So, some time ago it was viral augmented reality story at Eminem concerts, then there was a case with Loboda.

Photo Robert Bye (Unsplash.com)
But this is always an experiment in front of you – the whole crowd is facing the scene, the synchronization is quite simple. In the case of the stadium, you need to understand which side you are on the circumference of, the relative position so that the stadium sits in the space that is in the virtual environment. It was a sour challenge. They tried to solve it in various ways, and we got a close case to what was implemented by Loboda, but not in everything.

We let the user decide where he is. They made the layout of the stadium, where people chose the sector, row, place. All this in four “clicks”. Next, we had to determine the direction to the scene. To do this, we showed a silhouette of what a scene from a user angle should look like. He combined it, tapped and that’s it – the scene sat down. We tried to simplify this process as much as possible. Still, 90% of viewers who wanted to watch the show are not the people who have experience with augmented reality.

Dmitry: Was there a separate application for this project?

Andrew: Yes, the application for iOS and Android, which we pushed into the game. There was a separate promotional campaign for it. It was previously described in detail how to download and more.

Dmitry: You need to understand that a person has nowhere to physically verify and learn how to use such an application. Therefore, the task of “training” the audience was complicated.

Andrew: Yes Yes. With UX, we caught a lot of cones, because the user wants the experience in three clicks: downloaded, installed, launched, it worked. Many are too lazy to go through complex tutorials, read training, and more. And we did not try to explain everything to the user in the tutorial as much as possible: a window will open here, access to the camera here, otherwise it will not work, and so on. No matter how many explanations you write, how much you chew in detail, whatever GIFs you insert, people don’t read this.

In Minsk, we collected a large feedback pool for this part, and we have already changed a lot for the application in Kazan. We drove there not only those phonograms and those time codes that correspond to a specific episode of augmented reality, but we took all the phonograms and time codes completely. So the application heard what was happening at the time of launch, and – if the person hadn’t entered at that moment – it would give out information: “Comrade, I’m sorry, your AR episode will be in 15 minutes.”


A little bit about the architecture and approach to synchronization

Timecode (by audio versions) – 16:37


Dmitry: Still decided to do synchronization by sound?

Andrew: Yes, it happened by accident. We sorted through the options and came across a company Cifrasoft from Izhevsk. They do not really tricked out, but an iron-working SDK, which allows you to synchronize the sound with timing by sound. The system was positioned to work with TV, when you can output something in the application or give interactive content on the sound of conditional advertising.

Dmitry: But it’s one thing – you are sitting in your living room, and another is a multi-thousand stadium. How did you manage the quality of sound recording and its subsequent recognition?

Andrew: There were many fears and doubts, but in most cases everything was recognized well. They build signatures on the soundtrack with their tricky algorithms – the total weighs less than the original audio file. When the microphone listens to ambient sound, it tries to find these features and recognize the track by them. In good conditions, the accuracy of synchronization is 0.1-0.2 seconds. That was more than enough. In poor conditions, the discrepancy was up to 0.5 seconds.

Much depends on the device. We worked with a large fleet of devices. For iPhones, these are just 10 models. They worked fine in terms of quality and other features. But with androids, the zoo is such that my mom. Not everywhere it turned out that sound synchronization worked. There were cases when on different devices, besides different tracks, it was impossible to hear because of some features. Somewhere low frequencies leave, somewhere high begin to wheeze. But if the device had a normalizer on the microphone, synchronization always worked.

Dmitry: Please tell us about architecture – what was used in the project?

Andrew: We made the application on Unity – the easiest option in terms of multi-platform and graphics. Used AR Foundation. We immediately said that we would not like to complicate the system, so we limited ourselves to a fleet of devices that support ARKit and ARCore in order to have time to test everything. We made a plugin for tsifrasoftovskoy SDK, he lies with us on github. We made a content management system so that scripts run on a timeline.

We tinkered a bit with the particle system, because the user can log in at any time of a particular episode, and he needs to see everything from the moment from which he synchronized. Tinkering with a system that allows scripts to be played clearly in time so that a three-dimensional experience can be scrolled back and forth, like in a movie. If it works out of the box with classic animations, then I had to tinker with particle systems. At some point, they begin to spawn, and if you find yourself somewhere to the point of spawn, they have not yet been born, although they seem to be. But this problem, in fact, is easily solved.

For the mobile part, the architecture is quite simple. For broadcasting, everything is more complicated. We had limitations on iron. The condition was set from the customer: “Here we have such and such an iron park, roughly speaking, everything needs to work on it.” We immediately focused on the fact that we will work with relatively low-cost video capture cards. But budgetary does not mean that they are bad.

There was a restriction on hardware, on video capture cards and on working conditions – how we should get a picture. Capture cards – Blackmagic Design, worked according to the Internal keying scheme – this is when a video frame comes from the camera. The card has its own processing chip, which also has a frame that should be superimposed on top of the incoming one. The card mixes them – the more we touch nothing there and do not affect the frame from the video camera. The result through the video output, she spits out on the remote. This is a good method for applying captions and other similar things, but it is not very suitable for mixed reality effects, because there are many restrictions on the render pipeline.

Dmitry: In terms of real-time computing, object binding, or something else?

Andrew: In terms of quality and achieving the desired effects. Due to the fact that we do not know what overlay the image on top of. We simply provide color and transparency information on top of the original stream. Some effects like refractions, correct transparency, additional shadows with such a scheme cannot be achieved. To do this, you need to render everything together. For example, it will not work in any way to make the effect of air distortion from a fire or from hot asphalt. The same with the transmission of the transparency effect taking into account the refractive index. We initially made the content based on these restrictions, and tried to use the appropriate effects.


Dmitry: Did you have your content on the first project for the European Games?

Andrew: No, the main stage of content development was for the guys from Sechenov.com. Their graphic artists drew basic content with animations and other things. And we integrated everything into the engine, added additional effects, adapted so that everything worked correctly.

If we talk about the pipeline, then for television we collected everything on Unreal Engine 4. It coincided that they just at that moment began to force their tools for mixed reality (mixed reality). It turned out that everything is not so simple. All tools are raw even now, we had to finish a lot manually. In Minsk, we worked on a custom assembly of the engine, that is, we rewrote some things inside the engine so that, for example, we could draw shadows on top of real objects. On that version of the engine, which was then relevant, there were no features that allowed this to be done using standard tools. For this reason, our guys made their custom assembly in order to provide everything that was vital.


Other nuances and adaptation to WorldSkills in Kazan

Timecode (by audio versions) – 31:37


Dmitry: But all this in a fairly short time?

Andrew: Deadlines were for Kazan projectin Minsk – normal. About six months to develop, but taking into account the fact that six people were involved. At the same time, they made the mobile part, developed tools for teleproduction. There was not only a picture output. For example, a tracking system with optics, for this it was necessary to do your own toolkit.

Dmitry: Was the adaptation from one project to another? For a month and a half it was necessary to take advantage of developments and transfer the project with new content to a new site?

Andrew: Yes, it was a month and a half. We had planned a two-week vacation for the whole team after the Minsk project. But immediately after the closure, the guys from Sechenov.com come up and say: “Well, then let Kazan do it.” We still managed to relax a bit, but switched to this project quickly enough. Completed something on the technical side. Most of the time was spent on content, because for WorldSkills we completely did it, just agreed with the director’s team. There was only a script on their part. But it was easier – no extra iterations were needed. When you do the content yourself, you immediately see how it works in the engine, you can quickly edit and coordinate.

On the mobile part, we took into account all the subtleties that we had in Minsk. They made a new application design, reworked a bit of architecture, added tutorials, but tried to make it as short and clear as possible. Reduced the number of user steps from launching the application to viewing content. A month and a half was enough to make an adequate project. For a week and a half we went to the site. It was easier to work there, because all control over the project was in the hands of the organizers, it was not necessary to coordinate with other committees. It was easier and easier to work in Kazan and it was quite normal that there was less time.

Dmitry: But you decided to leave the approach to synchronization, as it was, by sound?

Andrew: Yes, we left by the sound. It worked well. As they say, if it works, don’t touch it. We just took into account the nuances of the soundtrack quality. When they did the introduction, there was just a training episode so people could try before the show starts. It was surprising that when at the moment of playing a track at the stadium there is a storm of applause, “live”, the system allows you to synchronize well on that track, but if the recorded applause is mixed with the track at that moment, the track ceases to be caught. These nuances were taken into account, and the sound was quite well synchronized.

P.S. In the second part of the issue, we talk about scientific data visualization, process modeling in other projects, game development and the master’s program “Game Development Technology“. We will publish the continuation in the following material. You can listen and support us here:


P.P.S. Meanwhile, on the English version of Habr: a closer look at ITMO University.


Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *