“What does it cost us to build the Capsule?” – how we created VK Capsule Neo

Many subconsciously perceive smart speakers as something simple to develop. “Well, the truth is, what is there: a speaker, a microphone, LEDs and Wi-Fi“. We at VK wanted to create an innovative product, and as a result, we went through an interesting path from the idea to the launch of our baby for everyone – VK Capsules Neo. In appearance, quite a typical smart column is fraught with technological solutions that other companies will try to repeat in their products in the future.

My name is Boris Kaganovich, I am the director of development and production of smart devices at VK. In this article, I will talk about how the idea to create a column came about and how we went from the idea of ​​​​a product to its implementation.

From idea to project

In 2020, the first smart speaker with Marusya, the Capsule, was released. The pandemic was already gaining momentum and the media vied with each other predicting a semiconductor crisis. The experience of previous launches and painstaking work with risks helped the team get through the path to the start of sales without fatal mistakes and shocks.

Risk management lessons were also useful to us while working on the next product – Capsule Mini. Its launch took place in 2021, at the peak of the “chipageddon” and anti-covid restrictions. Capsule Mini has several versions built on different hardware platforms. So we prepared for possible interruptions in the supply of chips, and, as it turned out, not in vain.

The first two products showed the playful and lively character of Marusya. The capsule could be stroked on the head and she purred, and in Mini, Marusya got expressive eyes, which we called eyemoji. In thinking about the evolution of the product line, the team focused on what evoked emotions in our users from delight to tenderness. This is how the image of an extremely affordable compact smart speaker with a cozy design and eye-watch was born.

We in the VK R&D team knew that smart speakers in China had reached the most budgetary segment of the market and were actively gaining a foothold there. It took some time to believe that the team was ready to create the most budgetary column with a clock. Some time was spent on market research, online meetings with chip makers and companies that have already traveled this path in China. The information received by the team was treated critically. So, for example, we planned 12-15 months from the start of the project to the launch of mass production, despite the fact that the experience of the Chinese indicated 24-36 months of development.

To make the product balanced and in demand among the target audience, the product team of the column tested quite a few hypotheses. Part of the product requirements has undergone a transformation already in the development process. Here are just a few examples:

Location of controls, microphones and ambient light sensor. Several options were considered, taking into account ergonomics, design and optimal placement of microphones and sensor. The choice of an obvious solution did not come immediately.

Possibility to hang the speaker on the wall. People may have their desk space taken up, or they may already have other smart speakers. Wherein:

  1. You will have to drive nails into the wall or screw in a screw, many buyers will not do this.

  2. There should be a socket next to the hanging speaker, which means it will need a long wire, which will not be convenient for everyone.

  3. Marusya hears better when the speaker is on a table or other horizontal surface.

The Neo capsule did not receive a VK wall mount.

Industrial customization. Among the options were non-standard colors and images of characters applied to the speaker body in an industrial way, right at the factory. Instead, the most elegant and lightweight solution appeared: the columns of standard colors come with sets of different stickers with mascots from the social network VKontakte.


Indeed, there were a lot of them. We decided to make a speaker on RTOS, an operating system that can be found in compact gadgets like fitness bracelets, headphones and smart watches. But we hadn’t encountered it before, so we had to master the development for this OS on the go.

Platform selection

At the time of choosing the chipset for Neo, there were no balanced solutions specifically for smart speakers on the market. The choice had to rely not only on compliance with technical requirements, but in some cases on intuition. ARM or RISC-V, the minimum required size of RAM and Flash memory, the type and number of interfaces for connecting speakers, microphones, a light sensor and a display – in conditions of closed borders, slow logistics and tight project deadlines, the team chose many of these without prior testing.


The technological superiority of our new platform and RTOS itself, oddly enough, is in its minimalism. The developers were faced with the challenge of putting smart speaker functionality in sixteen times less memory than our other devices, and making resource-intensive speech processing algorithms work efficiently on low-performance processor cores.

The market for platforms for RTOS turned out to be narrow: most of the chipsets available to us were intended for the Chinese domestic market, the documentation was also in Chinese, very scarce, and it was written for the specific requirements of local customers. I had to find a balance between the completeness of the documentation, the openness of the platform, the availability of the SDK and the readiness of the manufacturer to communicate, even through an intermediary. At some point, we worked with several manufacturers at once, assuming that the main one would give way to the backup. And so it happened.

The documentation is in Chinese, the SDK is securely protected from Development. Well, OK. We translated documents, found a way to work with the SDK, which, for reasons unknown to us, moved several times from one version of the OS to another, and then back. So the team acquired expertise in RTOS.


We had several design options to choose from. They could have taken the most technologically advanced and cheapest ones, but, of course, they chose the best and most beautiful 🙂 The complexity of assembly and the cost of the case were optimized during the project. The choice of material and painting technology especially made me sweat, so that the LEDs shone clearly and brightly through the body of any color. For each color of the column, dozens of experiments and tests had to be carried out.

Sound quality

The sound of all Capsules is excellent, and the team has the task of improving it with each new product. The sound of any acoustics is greatly influenced by the body: its design, volume, composition and thickness of the material, the density of pairing of parts. It was interesting to find a solution for several problems at once: for example, to simplify (cheapen) the design, getting rid of connections that could potentially increase vibration or cause rattling.

A lot of attention was also demanded by the choice of the design of the acoustic chamber – the box in which the speaker is mounted. The choice was between an open design with a phase inverter, a chamber with a passive radiator, and the so-called closed design – the most technologically advanced, but in theory promising more modest frequency response. It was the closed design that won the tests, showing a comparable frequency response with the Mini Capsule and a lower coefficient of non-linear distortion. The result was so surprising that the tests had to be repeated several times to make an informed choice.

As a result, the sound of the VK Capsule Neo turned out to be balanced, clear and deep.

Speech recognition algorithms

In the previous columns we had four microphones, and in the VK Capsule Neo we had only two. New microphone array configuration = new natural speech processing algorithms. In record time, the VK machine learning team has developed its KWS and VQE (voice quality enhancement) algorithms, which recognize the keyword and natural speech, as well as clear it of noise and determine the direction to the person speaking with the column. In terms of quality, these algorithms are superior to their counterparts and work no worse than in the older speakers of the line.

Big achievements for a small team

Daily meetings of the development team and constant communication with component suppliers kept our project at the highest priority in the partners’ schedule. For each critical component, a backup option was necessarily selected in case of force majeure.

The project was planned from day one in such a way that the best scenario for mass production was to launch ahead of schedule, and the acceptable scenario was to launch exactly on schedule. The careful study of the logistics of the samples also helped. Parcels of speaker samples made round-the-world trips at space speeds, keeping the project schedule. The project significantly upgraded the expertise of the VK logistics team.

Analyzing the success of the project, it should be noted that the R&D teams of VK and the partner factory have actually united. Common tasks, a common tracker and common goals defeated the distance of 7000 km, the language barrier and the difference in mentality. The project managers on both sides are great fellows.

Technology Leadership

We managed to find our niche in the smart speaker market: VK Capsule Neo became the cheapest smart speaker with a clock. The development team successfully shoved a flea with just 16MB of memory, and the product team added funny stickers to Marusya’s cute eyes and kept Neo’s great sound quality.

The novelty aroused great interest among competitors and leaders in the electronics market. Their representatives came to the factory, asked for samples of speakers and boards. Of course, we condemn industrial espionage, but we are ready to share our experience if it is useful for the market and does not violate the NDA.

We have mastered the new platform, perhaps even the first outside of China. We learned how to optimize the code by repeatedly compressing its size and reducing the requirements for processor performance. We learned how to work with new types of memory and amplifiers, with a new configuration of microphones. We have developed high-quality, professional built-in speech processing algorithms that have replaced third-party solutions in all our products. Showed the market the way to truly affordable smart speakers.

