Conversation with Dmitry Odintsov from Trueconf about intelligent noise reduction in video conferencing, deepfakes and holographic conferences

Do you use artificial intelligence or a neural network in your products? Is this neural network an open source solution? Could you tell us more about it?

We try not to use open-source in our solutions, and if we do, we change almost everything. Neural networks themselves do not make sense without datasets; in our case, datasets are exactly what we are proud of. For example, we have been training AI for noise reduction for more than two years.

Could you tell us in more detail about how the datasets themselves were created, what kind of model you are using?

Unfortunately, this is almost all a trade secret. However, I can tell you how we integrated intelligent noise cancellation into the apps. We wanted to remove all kinds of noise from video conferences, such as technological sounds: drills, machines, machines, and so on. After serious research, we realized that we had trained our neural network incorrectly. And we decided to change the vector of training: We started teaching her not noise, but speech. We recorded people, created a library-dataset of human voices, including using colloquial vocabulary, jargon and difficult-to-pronounce words. Even external users were involved. As a result, we managed to make a neural network that removes everything that it does not consider to be a voice. As a result, it cuts out extraneous noise directly from the stream in real time without wasting much resources.

This was exactly what happened at the conference where was demonstration of the operation of intelligent noise reduction, when the tractor was started next to the workplace and when the function was turned on, only the voice remained. Was this a production?

No, then it was a live broadcast, you can always test this noise reduction: just download the free version for tests and use it yourself. In addition, the broadcast took place as part of the national “Priority” award.

I've talked to people using your solution and they've complained about bad sound from their headphones, even though they're very expensive and good headphones. In this regard, a question: in that video, perhaps there was some kind of experimental version of your solution?

No, there was a standard version. And there was no special or experimental version in that video. Standard version for everyone. The issue of sound is the most important thing really in video conferencing, and unfortunately having good headphones doesn't make any difference because your end situation is a nice nice monitor, a good sound system to produce the audio, but the output is poor. However, it is worth thinking about what was at the entrance? On this side there could be a bad Internet channel, a device with a bad microphone, or some other problem. And, unfortunately, people simply don’t notice such things. For example, they give the manager a very cool camera, a cool display and very cool sound equipment. And no one thinks that this cool display will display a video with a creepy camera that is built into an old laptop. The built-in microphone in this laptop is the simplest. Moreover, the transmission takes place via the mobile Internet in a poor reception area. And the result is predictable: terrible audio and video, which is transmitted through terrible channels. But everyone blames the software. In such conditions, no manufacturer can do anything. However, if you take a more or less normal microphone, a more or less camera, a stable communication channel, then the quality of the video conference will definitely improve. You also need to remember about restrictions. They can be set globally by the video conferencing system administrator to save network traffic, and restrictions are also regulated in client applications on all OSes.

As for the channel, if the Firewall cuts everything hard, this will degrade the quality of the video conference. But if you have a more or less stable channel, even a mobile one, then at least the sound will be normal. For sound, you don’t need a very wide bandwidth: a stable 20 kb/s. Because if during a video session your channel blurs, and even serious jitter appears, you may not notice it, but you can always hear it in audio.

If problems occur within the same network, perhaps it could be due to an underconfigured server?

This has more to do with the infrastructure in which the video conferencing server is deployed. If this infrastructure is bad, poorly configured, with losses, errors, system loops, and so on, such an infrastructure will have problems not only with video conferencing, there will be problems with other services. However, I will make a reservation: video communication is a high-load situation on the network, and it very often happens that the customer is sure that his network is excellent. But as soon as a serious load begins, problems emerge.

At the beginning of 2024, I wrote news that Holoconnects was in America at CES presented a solution at the intersection of holographic projection technologies and artificial intelligence called Holobox. Your service was the first to contact me and saidthat your company also has such a development. And so I had a question: is this a coincidence or is there already a market for such solutions? Tell us more about this.

TrueConf and Eyefeelit solution

TrueConf and Eyefeelit solution

Holoconnects solution

Holoconnects solution

Coincidence. In general, in Israel this thing has been used for a year now. Many people have a great desire to get some kind of clever three-dimensional communication technology, like in Star Wars. We have been experimenting with stereo video communication for many years. This is shown on stereo displays when they were available. Now they are no longer there, and the technology has not advanced because there is simply nothing to display it on. The same applies to multi-angle monitors without 3D glasses. This technology is also dead, essentially. We can shoot video with a stereo camera, but we can’t display it on anything. There were several stories with displays on fans, on water, on steam, and so on. But this is not real-time display. In all these stories, the video must be prepared in a special way, so it cannot be displayed in real time.

However, people’s craving for technologies from science fiction with 3D video conferencing is still great. And so our Israeli partners made a seemingly similar solution, mainly for exhibitions.

But this is just a solution based on transparent displays. This, of course, is not 3D, like in the same “Star Wars”. Thanks to monitors, when your background is transparent, and the picture with the right backlight, the right choice of camera angle gives the illusion that there is a person sitting in the box. True, using this directly in video communication is quite problematic, because a person needs to film against a white background, with special lighting, in order to have the effect of full presence. But since we, without collusion with Holoconnects, have released similar solutions, this is some kind of the first bell about creating clever meeting rooms or video conferencing solutions that allow geo-distributed companies to bring people from different places closer together and increase the illusion of being together in the same space and more lively communication.

There are holographic chairs, where instead of the back of the chair you could see your interlocutors, but there is fan technology, so I doubt whether this will be possible in real time. And the quality will be very low. Well, the idea itself is not new, there was a man who made a chair with an attached holographic fan, but this design is very noisy. Therefore, the idea does not yet have such solutions.

Have there been any requests from Russia for these holographic boxes? Do you have solutions in ¼ of a full-size box??

Yes, requests have been received from Russia. As soon as the news broke. At the same time, you need to understand that people (not only in the Russian Federation) want much more than the technology itself can allow. Everyone wants “Here I am sitting at the table, let me be seen beautifully in this box.” Unfortunately, it won't work that way. However, we are already preparing a version for the Russian market and will soon be able to present it. Compact versions of HoloLive also already exist.

Is it possible to achieve this effect using several cameras placed in a circle?

Here it all comes down to an old problem. When we were doing 3D video communication, glasses were needed to view stereoscopy. As soon as you put on glasses, you all immediately look not very presentable, and people on the other side feel uncomfortable communicating with you. This is the main feature, but in offices, putting something on a person is a problem. It's the same story with the holographic box. You start to complicate the shooting location, but it will be uncomfortable. One person will be clearly visible to everyone, but how will everyone else be seen? This means that everyone needs such a place, or if it is a meeting room, then the effect of normal display will not work there either. The solution itself should be symmetrical and easy to use, so as not to crowd a bunch of equipment around. Now Google and Logitech have projects in the booth, something similar has been done. But this is how the point-to-point scheme is implemented, and we need to see how long it will last.

Now, thanks to AI, deepfakes have made great progress. People and videos have started to be faked, especially in video calls. Have companies asked you to recognize deepfakes in your products?

This is a known sword and shield issue. Technologies for creating deepfake and recognizing it are technologies that are constantly ahead of each other. As far as I understand, the technologies for creating deepfakes are still ahead.

Recognizing deepfakes is a little outside our area. As soon as this technology is more or less developed and understood, an API for these systems appears, then we will be happy to do it all, integrate everything. This is a cool thing and it would be interesting to do it, but given the high demand for this technology, additional investment will also be required. Well, information security companies clearly specialize more in recognizing deepfakes, but we are not involved in security. We would be happy to buy a ready-made recognition solution. Therefore, we are waiting for commercial products that we can integrate with ourselves.

Cyberattacks themselves using deepfakes are more related to public services rather than internal corporate ones. The main problem is not transmitting your deepfake; it is not that difficult. The question is: how to get there if we use an internal corporate messenger; In order to transfer a video, you need a login, password, authorization. And this is where information security comes into play. I repeat, this is definitely not about us. Yes, and there is multi-factor authentication, it is more difficult to pass. Therefore, the problem of deepfakes in the corporate segment is not as widespread as the problem of primary penetration. There is, however, a caveat if the corporate segment does not use public services. Here the problem is more acute; you can connect to Zoom or Google Meets without authorization. And of course, it’s easy to send a link from a fake address and pretend to be a deepfake of the company’s director. However, this also relies on phishing, social engineering and the lack of multi-factor authentication. The role of the deepfake itself is small, although it provides additional profit in cybercrime.

Let's move on to another technology that has created a stir lately. Question about Apple Vision: are you currently developing something for them, testing something, for example videoconferencing in some form?

Of course, some work is being done. Just a few days ago we adapted the client for Apple Vision Pro – it can be installed on glasses from the AppStore. All familiar functions are available. Video conferencing in Dune (Arakis) or on the Moon, of course, looks fantastic, but the cost of the device will not allow it to be used for communications everywhere. We are waiting for technologies that can properly photograph a person’s face while wearing 3D glasses. And in general, why do we love video conferencing? We see the face, we see the eyes, we see emotions, there is a certain effect of presence. All these glasses help you immerse yourself, but for your interlocutors you are an unnatural avatar, which is sad.

Yes, there are technologies for creating a face model; sensors detect this well, but still not at the proper level. For example, Nvidia has a cool technology that allows you to superimpose your eyes if you're not looking at the camera. But any turn of the head or incorrect tilt – the eyes overlap the forehead, the nose. If we talk about virtual worlds and transferring your avatar, this is great and in some online conferences it’s not bad, but, to be honest, it has an indirect relation to video communication

The problem with Apple Vision, Oculus and other such glasses is that there is no feedback. They all work in one direction. There are augmented reality technologies with the possibility of feedback, but this is also not videoconferencing, but control over the execution of certain tasks. It’s just less expensive than sending a specialist somewhere far away. In this case, the glasses work like cool video surveillance.

From our conversation, I understood two things: although video conferencing solutions are now at an excellent level, 3D holograms are not yet available to us for communication, and for video conferencing, a device worn on the head is not the best solution. Well, let's wait until holographic fans reach a fairly good level of demonstration and scanning cameras appear for a good level of image transmission. Well, or we will see the creation of some new technology for receiving and transmitting holographic images.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *