part two

!

In the last article, I described the hardware implementation of my voice assistant based on a budget single-board device Orange Pi Zero 2W with 4GB of RAM. I would like to dedicate this article to the software implementation of this device. If you are interested, then welcome to the cat.

Probably, many of you have an idea of ​​the approximate operating principle of voice assistants; the image below shows a simplified operating logic of this device.

As you can see, one of the key functions is the conversion of speech into text data, in other words, transcription. Next, we need to process the received text data and determine the presence of a command in it by performing a comparison with the command dictionary. Sometimes this vocabulary is called skills. Well, the last stage is the execution of the recognized command. Each command in the dictionary has a “link” to a specific function, which will be executed when the command matches. Actually, this is a primitive description of the work of our voice assistant. Of course, our device should not silently execute commands, but have some kind of interaction with the user, preferably in a natural way for him, therefore, for these purposes, the device has a speech synthesis function.

Preparation

Before you continue, you must follow the steps in the previous article:

  • Connect the sound module;
  • Install the operating system;
  • Configure audio devices.

Python is selected as the programming language for our device – you need to install it too. It is recommended to use version no lower than 3.11. If the above conditions are met, then we continue in order.

Transcription

Since in our device we adhere to the concept of independence from external services, speech recognition should be performed exclusively locally. There are many ML-based libraries for offline speech recognition now, but my searches and tests led to a great solution called

Vosk

from company

Alphacephei

. Considering the limited system resources of our single board computer, we can confidently say that this library fits perfectly into our project.

Advantages of the library:

  • Supports 20+ languages ​​and dialects – Russian, English, Indian English, German, French, Portuguese, Spanish, Chinese, Turkish, Vietnamese, Italian, Dutch, Valencian, Arabic, Greek, Persian, Filipino, Ukrainian, Kazakh, Swedish, Japanese, Esperanto, Hindi, Czech, Polish, Uzbek, Korean, Tajik, Gujarati. More will be added soon.
  • Works without network access even on mobile devices – Raspberry Pi, Android, iOS.
  • Installs using a simple pip3 install vosk command without additional steps.
  • Models for each language take up only 50MB, but there are much more accurate larger models for more accurate recognition.
  • Made for streaming audio processing, which allows for instant response to commands.
  • Supports several popular programming languages ​​- Java, C#, Javascript and others.
  • Allows you to quickly configure the recognition dictionary to improve recognition accuracy.
  • Allows you to identify the speaker.

So, to install the package we need to run the command:

pip3 install vosk

Below is an example of using streaming speech recognition from the system microphone:

import sounddevice as sd
import vosk
import json
import queue

device_m = 2                                                  # Индекс аудиоустройства (микрофон)
model = vosk.Model("model_stt/vosk-model-small-ru-0.22")      # Модель нейросети
samplerate = 44100                                            # Частота дискретизации микрофона
q = queue.Queue()                                             # Потоковый контейнер


def q_callback(indata, frames, time, status):
    q.put(bytes(indata))


def voce_listen():
    with sd.RawInputStream(callback=q_callback, channels=1, samplerate=samplerate, device=device_m, dtype="int16"):
        rec = vosk.KaldiRecognizer(model, samplerate)
        sd.sleep(-20)
        while True:
            data = q.get()
            if rec.AcceptWaveform(data):
                res = json.loads(rec.Result())["text"]
                if res:
                    print(f"Фраза целиком: {res}")
            else:
                res = json.loads(rec.PartialResult())["partial"]
                if res:
                    print(f"Поток: {res}")


if __name__ == "__main__":
    voce_listen()

The output of this example is:

Поток: один два 
Поток: один два три это 
Поток: один два три это тест
Поток: один два три это тест
Фраза целиком: один два три это тест

As you can see from the example, two methods are used to output text data

rec.Result()

And

rec.PartialResult()

. Both methods output text data in json format, the first method is responsible for outputting the entire phrase, and the second is responsible for outputting words as they are recognized. For our project, it is more appropriate to use the first method.

By the way, by combining synthesis and speech recognition, you can do funny things, for example ;):

Finding and executing a command

Above, we decided that we will receive input data for command recognition using the method

rec.Result()

, which displays the entire phrase. Now it remains to create a dictionary of commands with which we will work. Below is an example of this dictionary.

command_dic = {
    "help": ('список команд', 'команды', 'что ты умеешь', 'твои навыки', 'навыки'),
    "about": ('расскажи о себе', 'что ты такое', 'ты кто', 'о себе'),
    "ctime": ('время', 'текущее время', 'сейчас времени', 'который час', 'сколько время'),
    "ask_chat_gpt": ('спроси чат джи пи ти', 'спроси чат', 'спроси умную нейросеть', 'узнай у чат джи пи ти', 'давай поговорим с чат джи пи ти'),
    "lightON": ('включи свет', 'свет включи', 'зажги свет', 'свет включи'),
    "lightOFF": ('выключи свет', 'свет отключи', 'погаси свет', 'свет выключи'),
    "SmartSwichON": ('включи розетку', 'вкл розетку', 'розетку включить', 'розетку вкл'),
    "SmartSwichOFF": ('выключи розетку', 'выкл розетку', 'розетку отключи', 'розетку выкл'),
    "SmartPARAM": ('информация о розетке', 'данные розетки', 'информация сети питания', 'данные умной розетки'),
    "Weather": ('какая погода', 'какая погода на улице', 'информация о погоде', 'какая погода сейчас'),
    "Weather_temp": ('температура на улице', 'какая сейчас температура на улице', 'информация о температуре на улице', 'уличная температура'),
    "temp_home": ('температура в доме', 'какая сейчас температура в доме', 'температура в комнате', 'комнатная температура'),
    "air_home": ('атмосфера в доме', 'качество воздуха в доме', 'воздух в комнате', 'загрязнение воздуха в комнате'),
    "LightShowOn": ('включи красоту', 'включи огоньки', 'включи праздник', 'включи яркие огоньки'),
    "LightShowOff": ('выключи красоту', 'выключи огоньки', 'выключи праздник', 'выключи яркие огоньки', 'отключи красоту', 'отключи огоньки', 'отключи праздник', 'отключи яркие огоньки' ),
    "HumOn": ('включи увлажнитель', 'запусти увлажнитель', 'начать увлажнение'),
    "HumOff": ('выключи увлажнитель', 'отключи увлажнитель', 'прекратить увлажнение'),
    "robovacuum_start": ('запускай максима', 'выпускай монстра', 'выпускай зверя','начни уборку', 'начинай уборку', 'запусти пылесос'),
    "robovacuum_stop": ('останови максима', 'загони обратно монстра', 'угомони монстра', 'прекрати уборку', 'останови пылесос', 'останови уборку', 'хватит уборки'),
    "volup": ('громче', 'добавь громкость', 'сделай громче', 'добавь звук'),
    "voldown": ('тише', 'убавь громкость', 'сделай тише', 'убавь звук'),
    "volset": ('установи уровень громкости', 'уровень громкости', 'громкость на'),
    "usd_curs": ('курс доллара', 'цена доллара', 'стоимость доллара'),
    "usd_euro": ('курс евро', 'цена евро', 'стоимость евро'),
    "marko": ('марко', 'марка', 'марке'),
    "btc_usd": ('курс биткойна', 'цена биткойна', 'стоимость биткойна', 'биткоин'),
    "youtube_counter": ('статистика ютуб', 'как дела с ютубом', 'сколько подписчиков на ютубе', 'что с подписчиками'),
    "tv_pause": ('поставь на паузу', 'поставь телевизор на паузу', 'телевизор пауза', 'пауза'),
    "tv_play": ('сними с паузы', 'сними телевизор с паузы', 'запусти воспроизведение', 'воспроизведение')
}

As you can see, our dictionary consists of keys and tuples of strings that indicate command pronunciation options. As you understand, standard string comparison methods will not work here; more precisely, they may work if you memorize all the commands 😉 – but this is not our way, so we will use fuzzy comparison algorithms. The library is ideal for our purposes.

FuzzyWuzzy

which uses the “Levenshtein distance” metric in its functionality.

To install the package, use the following command:

pip3 install fuzzywuzzy

And to speed up the library by 4-10 times (according to the official documentation), install an additional package:

pip3 install python-Levenshtein

For fuzzy string comparison the following method is used:

fuzz.ratio('Строка один','Строка два')

The output of this method is the percentage of string similarity. For clarity, let's run the following example function using our command dictionary:


def recognize_command(cmd: str):
    for c, v in command_dic.items():
        for x in v:
            similarity = fuzz.ratio(cmd, x)
        print(f"Совпадение команды: {similarity}% | Ключ: {c}")

if __name__ == "__main__":
    recognize_command(" время ")

Once executed in the terminal we will get the following output:

Совпадение команды: 15% | Ключ: help
Совпадение команды: 31% | Ключ: about
Совпадение команды: 60% | Ключ: ctime
Совпадение команды: 26% | Ключ: ask_chat_gpt
Совпадение команды: 33% | Ключ: lightON
Совпадение команды: 32% | Ключ: lightOFF
Совпадение команды: 33% | Ключ: SmartSwichON
Совпадение команды: 32% | Ключ: SmartSwichOFF
Совпадение команды: 22% | Ключ: SmartPARAM
Совпадение команды: 15% | Ключ: Weather
Совпадение команды: 23% | Ключ: Weather_temp
Совпадение команды: 21% | Ключ: temp_home
Совпадение команды: 17% | Ключ: air_home
Совпадение команды: 30% | Ключ: LightShowOn
Совпадение команды: 29% | Ключ: LightShowOff
Совпадение команды: 25% | Ключ: HumOn
Совпадение команды: 21% | Ключ: HumOff
Совпадение команды: 18% | Ключ: robovacuum_start
Совпадение команды: 20% | Ключ: robovacuum_stop
Совпадение команды: 22% | Ключ: volup
Совпадение команды: 24% | Ключ: voldown
Совпадение команды: 32% | Ключ: volset
Совпадение команды: 17% | Ключ: usd_curs
Совпадение команды: 29% | Ключ: usd_euro
Совпадение команды: 33% | Ключ: marko
Совпадение команды: 0% | Ключ: btc_usd
Совпадение команды: 16% | Ключ: youtube_counter
Совпадение команды: 0% | Ключ: tv_pause
Совпадение команды: 27% | Ключ: tv_play

where the “ctime” command was successfully identified with 60% similarity. And slightly changing the function to the following option:

def recognize_command(cmd: str):
    similarity_percent = 60
    command = 'no_data'
    for c, v in command_dic.items():
        for x in v:
            similarity = fuzz.ratio(cmd, x)
        print(f"Совпадение команды: {similarity}% | Ключ: {c}")
        if similarity >= similarity_percent:
            command = c
    return command

We can already send the key of the recognized command to the command handler. The operator is used as a command handler

match

below is an example of a handler function:

def command_processing(key: str):
    match key:
        case 'help':
            f_help()
        case 'about':
            f_about()
        case 'ctime':
            f_ctime()
        case 'lightON':
            f_lightON()
        case 'lightOFF':
            f_lightOFF()
        case _:
            print('Нет данных')

This handler executes the code in the case statement, the value of which corresponds to the key being checked. Below is an example of my light switching function:

def f_lightON():
    try:
        contents = urllib.request.urlopen("http://192.168.1.56/status").read()
        response0 = json.loads(contents)
        if response0['channel1'] == 'Off' and response0['channel2'] == 'Off':
            text = "Включила свет"
        if response0['channel1'] == 'On' and response0['channel2'] == 'Off':
            text = "Первый светильник уже включен, включила второй!"
        if response0['channel1'] == 'Off' and response0['channel2'] == 'On':
            text = "Второй светильник уже включен, включила первый!"
        if response0['channel1'] == 'On' and response0['channel2'] == 'On':
            text = "Свет уже включен! Но я могу выключить, если попросите!"
        if response0['channel1'] == 'Off':
            response2 = requests.get('http://192.168.1.56/powerS')
        if response0['channel2'] == 'Off':
            response2 = requests.get('http://192.168.1.56/powerS2')
        tts.speak(text)
    except:
        tts.speak("Сожалею, но возникла ошибка, попробуйте позже!")

Speech synthesis

As a means of natural interaction between man and machine, the voice interface deservedly occupies high popularity; one of the components of this interaction is a speech synthesis system. Modern speech synthesis systems, as a rule, are built on the basis of machine learning algorithms, so in our project we will use similar solutions. While searching for the optimal option for the single-board computer used in the project, I came across a solution that perfectly combined the speed and quality of speech synthesis – these are free LM models from the company

Silero

. The speech synthesis system itself is built on a popular machine learning framework

PyTorch

. Here is an example of implementing a speech synthesis system for this project:

import os
import torch
import sounddevice as sd
import time

current_directory = os.getcwd()
path="model_tts"
isExist = os.path.exists(path)

if not isExist:
    os.makedirs(path)

local_file_ru = 'model_tts/4_ru_model.pt'
sample_rate = 24000  # 8000, 24000, 48000 - частота дискретизации генерируемого аудиопотока
speaker="kseniya"  # aidar, baya, kseniya, xenia, random - модель голоса
put_accent = True
put_yo = False
device = torch.device('cpu')  # cpu или gpu
torch.set_num_threads(8)  # количество задействованных потоков CPU

if not os.path.isfile(local_file_ru):
    torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v4_ru.pt', local_file_ru)

model = torch.package.PackageImporter(local_file_ru).load_pickle("tts_models", "model")

torch._C._jit_set_profiling_mode(False)
torch.set_grad_enabled(False)
model.to(device)
sd.default.device = 4  # аудиоустройство для вывода


def speak(text: str):
    audio = model.apply_tts(text=text + "..",
                            speaker=speaker,
                            sample_rate=sample_rate,
                            put_accent=put_accent,
                            put_yo=put_yo)

    sd.play(audio, sample_rate)
    time.sleep((len(audio) / (sample_rate)) + 0.5)
    sd.stop()
    del audio  # освобождаем память


if __name__ == "__main__":
    speak("Это тестовый синтез речи")

For this script to work, you need to install the following dependencies:

pip3 install numpy torch

As you can see from the example, the implementation in code is not complicated, which makes it easy to integrate this speech synthesis method into a project. But there is one nuance 🙂 Inquisitive minds have probably already guessed what I mean. The fact is that when playing synthesized speech, our speech recognition algorithm will inevitably be used – this will lead to an excessive load on the system resources of our single-board device, which will have an extremely negative impact on the operation of the system as a whole. To avoid this, let's use a little trick and change the speech recognition system function code as follows:


mic = 2                                   # адрес аудиоустройства микрофона
on_mic  = f'amixer -c {mic} set Mic cap'  # команда отключения глушилки
off_mic = f'amixer -c {mic} set Mic nocap'# команда на глушение микрофона

def speak(text: str):
    audio = model.apply_tts(text=text + "..",
                            speaker=speaker,
                            sample_rate=sample_rate,
                            put_accent=put_accent,
                            put_yo=put_yo)
    os.system(off_mic)      # глушим микрофон
    sd.play(audio, sample_rate)
    time.sleep((len(audio) / sample_rate) + 0.5)
    sd.stop()
    os.system(on_mic)        # отключаем глушилку микрофона
    del audio                # освобождаем память

As you can see from the comments in the code, we have implemented “silencing” the microphone using a simple command line command), this allows us to solve the problem described above.

Another unexpected problem with speech synthesis was that the model “does not understand” numbers, for example, if we input the string “there were 10 apples,” then at the output we will get speech synthesis with the phrase “there were apples.” In addition to this problem, it was also necessary to implement declination of units of measurement. Below is the code that solves these problems:

import decimal

units = (
    u'ноль',

    (u'один', u'одна'),
    (u'два', u'две'),

    u'три', u'четыре', u'пять',
    u'шесть', u'семь', u'восемь', u'девять'
)

teens = (
    u'десять', u'одиннадцать',
    u'двенадцать', u'тринадцать',
    u'четырнадцать', u'пятнадцать',
    u'шестнадцать', u'семнадцать',
    u'восемнадцать', u'девятнадцать'
)

tens = (
    teens,
    u'двадцать', u'тридцать',
    u'сорок', u'пятьдесят',
    u'шестьдесят', u'семьдесят',
    u'восемьдесят', u'девяносто'
)

hundreds = (
    u'сто', u'двести',
    u'триста', u'четыреста',
    u'пятьсот', u'шестьсот',
    u'семьсот', u'восемьсот',
    u'девятьсот'
)

orders = (
    ((u'тысяча', u'тысячи', u'тысяч'), 'f'),
    ((u'миллион', u'миллиона', u'миллионов'), 'm'),
    ((u'миллиард', u'миллиарда', u'миллиардов'), 'm'),
)

minus = u'минус'


def thousand(rest, sex):
    prev = 0
    plural = 2
    name = []
    use_teens = 10 <= rest % 100 <= 19
    if not use_teens:
        data = ((units, 10), (tens, 100), (hundreds, 1000))
    else:
        data = ((teens, 10), (hundreds, 1000))
    for names, x in data:
        cur = int(((rest - prev) % x) * 10 / x)
        prev = rest % x
        if x == 10 and use_teens:
            plural = 2
            name.append(teens[cur])
        elif cur == 0:
            continue
        elif x == 10:
            name_ = names[cur]
            if isinstance(name_, tuple):
                name_ = name_[0 if sex == 'm' else 1]
            name.append(name_)
            if 2 <= cur <= 4:
                plural = 1
            elif cur == 1:
                plural = 0
            else:
                plural = 2
        else:
            name.append(names[cur - 1])
    return plural, name


def num2text(num, main_units=((u'', u'', u''), 'm')):
    _orders = (main_units,) + orders
    if num == 0:
        return ' '.join((units[0], _orders[0][0][2])).strip()

    rest = abs(num)
    ord = 0
    name = []
    while rest > 0:
        plural, nme = thousand(rest % 1000, _orders[ord][1])
        if nme or ord == 0:
            name.append(_orders[ord][0][plural])
        name += nme
        rest = int(rest / 1000)
        ord += 1
    if num < 0:
        name.append(minus)
    name.reverse()
    return ' '.join(name).strip()


def decimal2text(value, places=2,
                 int_units=(('', '', ''), 'm'),
                 exp_units=(('', '', ''), 'm')):
    value = decimal.Decimal(value)
    q = decimal.Decimal(10) ** -places

    integral, exp = str(value.quantize(q)).split('.')
    return u'{} {}'.format(
        num2text(int(integral), int_units),
        num2text(int(exp), exp_units))

An example of using this solution in the current time function:

def f_ctime():
    now = datetime.datetime.now()
    text = "Сейч+ас "
    male_units = ((u'час', u'часа', u'часов'), 'm')
    text += digit_to_text.num2text(int(now.hour), male_units) + '.'
    male_units = ((u'минута', u'минуты', u'минут'), 'm')
    text += digit_to_text.num2text(int(now.minute), male_units) + '.'
    tts.speak(text)

By

previous article

Many people probably remember that the audio board contains an amplifier control signal (mute) and an LED to indicate the activity of the smart speaker. To manage these things, let's adjust the code as follows:

import os
import torch
import sounddevice as sd
import time
mic = 2                                   # адрес аудиоустройства микрофона
on_mic  = f'amixer -c {mic} set Mic cap'  # команда отключения глушилки
off_mic = f'amixer -c {mic} set Mic nocap'# команда на глушение микрофона

os.system("gpio mode 18 out")             # устанавливаем режим пина разрешения усилителя
os.system("gpio mode 16 out")             # устанавливаем режим пина светодиодного индикатора
os.system("gpio write 18 0")              # Отключаем усислитель чтобы не шумел

current_directory = os.getcwd()
path="model_tts"
isExist = os.path.exists(path)

if not isExist:
    os.makedirs(path)

local_file_ru = 'model_tts/4_ru_model.pt'
sample_rate = 24000  # 8000, 24000, 48000 - частота дискретизации генерируемого аудиопотока
speaker="kseniya"  # aidar, baya, kseniya, xenia, random - модель голоса
put_accent = True
put_yo = False
device = torch.device('cpu')  # cpu или gpu
torch.set_num_threads(8)  # количество задействованных потоков CPU

if not os.path.isfile(local_file_ru):
    torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v4_ru.pt', local_file_ru)

model = torch.package.PackageImporter(local_file_ru).load_pickle("tts_models", "model")

torch._C._jit_set_profiling_mode(False)
torch.set_grad_enabled(False)
model.to(device)
sd.default.device = 4  # аудиоустройство для вывода


def speak(text: str):
    audio = model.apply_tts(text=text + "..",
                            speaker=speaker,
                            sample_rate=sample_rate,
                            put_accent=put_accent,
                            put_yo=put_yo)
    os.system(off_mic)             # глушим микрофон
    os.system("gpio write 18 1")   # разрешающий сигнал для усилителя
    os.system("gpio write 16 1")   # зажикаем светодиод индикатора
    sd.play(audio, sample_rate)
    time.sleep((len(audio) / sample_rate) + 0.5)
    sd.stop()
    os.system(on_mic)              # отключаем глушилку микрофона
    os.system("gpio write 18 0")   # отключаем усилитель
    os.system("gpio write 16 0")   # отключаем индикатор
    del audio                       # освобождаем память


if __name__ == "__main__":
    speak("Это тестовый синтез речи")

As you can see, pins are managed using the same methods that I described in the previous article.

Call her by name

As you might have guessed, to work with the voice assistant you need to use an activation phrase. In my smart speaker for these purposes, I decided to use the name Alpha. The choice of this name was greatly influenced by the Apple series “Extrapolations”, where there was the Alpha corporation with the ubiquitous voice assistant of the same name. Below is an example of activation:

First, we need to create a tuple of variations in the pronunciation of the name:

sys_alias= ('альфа', 'альф', 'альфа',  'альфу', 'альфи')

Next, let's talk about the activation algorithm. In essence, the algorithm is simple and can be expressed in the following description: from the text data stream, we determine the presence of the name Alpha -> If the name is available, we activate the next stage: defining the command -> and then, executing the command. As soon as our assistant detects his name, he will emit a sound signal along with a light indication and will wait for the next phrase with the command. What it looks like in code:

name recognition function

def name_recognize(name: str):
    words = name.split()
    stat = False
    for item in sys_alias:
        similarity = fuzz.ratio(item, words[0])
        if similarity > 70:
           stat = True

    return stat

This function also uses fuzzy string comparison and returns the Boolean value True after a name match. Since the function input can receive text data in the form of a phrase, we need to convert a string into a tuple of words using the method

name.split()

and use the zero element of the created tuple for comparison. Below is an example of text data processing code for activating and sending the text of a phrase to define commands:

def response(voice: str):
    if glob_var.read_bool_wake_up():                  # этап второй, распознование команды
        command_processing(recognize_command(voice))  # распознование и выполнение команды
        glob_var.set_bool_wake_up(False)              # после выполнения команды, перехохим в режим распознования имени
    glob_var.set_bool_wake_up(name_recognize(voice))  # проверяем наличие имени в потоке
    if glob_var.read_bool_wake_up():                  # если имя обнаружено, воспроизводим звуковой сигнал
        tts.play_wakeup_sound('notification.wav')

As you can see, to make it easier to work with global data, a script named glob_var was created:

hears    = False
gpt_bool = False
wake_up  = False
voice=""
volset   = 0

def set_bool_mic(bools):
    global hears
    hears = bools
    print(hears)

def read_bool_mic():
    global hears
    return hears
    
def set_bool_gpt(bools):
    global gpt_bool
    gpt_bool = bools
    print(gpt_bool)

def read_bool_gpt():
    global gpt_bool
    return gpt_bool

def set_bool_wake_up(bools):
    global wake_up
    wake_up = bools
    print(wake_up)

def read_bool_wake_up():
    global wake_up
    return wake_up

def set_volset(vol):
    global volset
    volset = vol
    print(vol)

def read_volset():
    global volset
    return volset

def set_voice(voices):
    global voice
    voice = voices
    print(voices)

def read_voice():
    global voice
    return voice

And to play a beep when a name is detected, the following function is used:


file_path_n = current_directory + '/sound/notification.wav'  # Извлечение данных и частоты дискретизации из файла
data_n, fs_n = sf.read(file_path_n, dtype="float32") 
file_path_logo = current_directory + '/sound/start_logo.wav'  # Извлечение данных и частоты дискретизации из файла
data_logo, fs_logo = sf.read(file_path_logo, dtype="float32") 

def play_wakeup_sound(sound: str):
    global data_logo, fs_logo
    global data_n, fs_n
    if sound.__eq__('notification.wav'):
       data = data_n
       fs   = fs_n
    else:
       data = data_logo
       fs   = fs_logo
    sd.play(data, fs)
    os.system("gpio write 18 1")
    os.system("gpio write 16 1")
    sd.wait()                                  # Ждем окончания воспроизведения файла
    os.system("gpio write 18 0")
    sd.stop() 

where the name of the file to be played is passed as an argument. This function is also used to play a welcome audio file when the system starts. The video below shows the initial launch of the voice assistant.

Our project also implements a volume control function at the software level, but that’s another story. You can see a demo of the implementation below:

Results

In this article, I tried to very briefly and clearly describe the main “core” of my DIY smart speaker without candy wrappers and buns. I hope it is not boring and will be useful to you. Any suggestions, ideas, comments? Welcome to the comments. Thank you for your attention and beaver everyone!

Links to the article


Read also:


News, product reviews and competitions from the Timeweb.Cloud team – in our Telegram channel


Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *