Rhasspy is an open source and fully offline speech toolkit. Recognition of the Russian language. No leaks to the cloud

Photo from a comparison of microphone arrays for DIY devices such as a homemade smart speaker

Systems like Amazon Echo send your sensitive conversations (even those recorded by accident) to the cloud for storage. In some cases, recordings are heard by live operators. This is not just a loss of privacy. It’s like voluntarily letting in a “Comrade Major” who is standing nearby 24 hours a day, listens and carefully records, pretending to be a helpful assistant.

Instead of buying a commercial system from corporations such as Google, Amazon or Yandex, you can build a similar open source system based on the Raspberry Pi 2-3 B / B +, personal computer or laptop.

Rhasspy is a secure voice assistant that works offline. It does not transmit anything to remote services, while successfully coping with speech recognition and voice commands.

Rhasspy has a very simple integration into any software or hardware system where you want to add voice control. The author explains that the tool was originally written for the Home Assistant project, but is now compatible with most other home automation systems (Hass.io, Node-RED, OpenHAB, Jeedom).

Rhasspy is optimized for working with external services via MQTT, HTTP or Websockets. Optimized specifically for voice commands with a clearly defined grammatical structure (turn on / off the light, make music louder / quieter, etc.)

14 languages are supported, including Russian.

The model of work is described in the documentation. It is based on the recognition of voice commands through a specific template language, specially adapted for this area. These commands are classified by intent and may contain slots or tags, such as the color for lighting or the name of the particular fixture to which the command is given.

To get started, list the intentions (in square brackets) and possible ways to call them. The template looks something like this:

[LightState]

states = (on | off)
turn () {state} [the] light

According to this pattern, Rhasspy will generate a JSON code that can be used by a home automation system, an external application or a hardware device (via Node-RED, web sockets):

{
    "text": "turn on the light",
    "intent": {
        "name": "LightState"
    },
    "slots": {
        "state": "on"
    }
}

Speech recognition is performed directly by pocketsphinx: a lightweight open source engine with support for the Russian language. It is great for mobile devices or single board computers like the Raspberry Pi.

Sound processing happens offline on your device. The sound itself can come from a Raspberry Pi microphone array (such as a ReSpeaker 4 Mic Array or ReSpeaker 2 Mics pHAT) or from an audio stream over a network.

Rhasspy is just a very convenient tool to link the speech recognition engine to a home automation system or some other system that requires voice control. In principle, it can be used anywhere: for example, in mobile applications. Or in some kind of home robot such as a vacuum cleaner or a bartender.

It is nice when the robot performs all the same actions as before, but now by voice command.

The author of Rhasspy is also the author of the voice2json project: it is a console program for about the same task, to easily convert human speech into a list of computer commands (or vice versa).

It seems the future is with voice interfaces. In this case, it is very important that the processing of sound streams takes place locally and does not require Internet access.