A guide to setting up Yandex SpeechSense speech analytics solutions and platform documentation from scratch

If you have your own contact center, the task of finding a mention of something specific in a large number of audio fragments arises regularly. Recently we were granted access to the prewiev mode of the Yandex Speech Sense service and I decided to try to solve one of these problems using this service.

All difficulties, subtle points and the full path from the beginning of setup to victory are described in this article.

In principle, the documentation describes functionality that differs little from other speech analytics systems, and this is only at first glance. The surprise was the complete absence of our own means of loading audio fragments into the system. You must complete the entire download yourself. The documentation describes only one way, using bash and pyton under unix. Therefore, in this article I will also describe how to adapt such instructions for Windows and PowerShell.

The instructions themselves are quite short. https://yandex.cloud/ru/docs/speechsense/operations/data/upload-data I expected to do it in a couple of hours, but I didn’t succeed. So your benefit from reading the article may be 2-3 business days. Or more.

Start setting up your environment before cloning the repository https://github.com/yandex-cloud/cloudapi here from this instruction:
https://packaging.python.org/en/latest/tutorials/installing-packages/ to the item “Creating Virtual Environments”. Check your Python version to be higher than 3.6

py --version

and the presence of pip

py -m pip --version

If pip is not configured, the link tells you what to do. Then run pip update and other things

py -m pip install --upgrade pip setuptools wheel

This will save you at least your nerves, and maybe even time.

If everything went well, the grpcio-tools package for Windows must be installed differently than indicated in the Yandex documentation

pip install grpcio-tools

and like this:

py -m pip install "grpcio-tools"

Next step instructions

python3 -m grpc_tools.protoc -I . \
     --python_out=./upload_data/ \
     --grpc_python_out=./upload_data/ \
     yandex/cloud/speechsense/v1/*

For PowerShell you need to change to

py -m grpc_tools.protoc --proto_path=с:\project\speech_sense\ 
--python_out=с:\project\speech_sense\upload_data\ 
--grpc_python_out=с:\project\speech_sense\upload_data\ 
с:\project\speech_sense\yandex\cloud\speechsense\v1\*

(https://stackoverflow.com/questions/57909401/what-are-the-command-line-arguments-passed-to-grpc-tools-protoc)

This is if c:\project\speech_sense\ is the path where you cloned the Yandex repository. And yes, this command will definitely not work the first time.

For everything to work, you need to copy the third_party\googleapis\ folder to the root folder of the project (c:\project\speech_sense), otherwise you will get an error

"google/api/annotations.proto: File not found. yandex/cloud/speechsense/v1/talk_service.proto:9:1: Import "google/api/annotations.proto" was not found or had errors."

That's right, you thought correctly. In the Yandex cloudapi repository, the google folder is initially located so that the instructions from the Yandex documentation are guaranteed not to work. You are more aware of the need for such manipulation you won't read it anywhere.

There are no problems with creating the upload_grpc.py file, but then there is the instruction point “Set the API key of the service account:” and it is not marked in any way that this no longer needs to be written into the Python script, but this is again a bash command. For windows it is not export but
set API_KEY=
(https://stackoverflow.com/questions/59087882/how-do-i-set-an-api-key-in-windows-terminal) It’s not clear how this decision to make it work later, but I’m getting ahead of myself.

And finally, having finished setting up and running

py upload_grpc.py --audio-path с:\project\speech_sense\upload_data\wave\123.wav 
--connection-id ID 
--key ${API_KEY}

get an error ModuleNotFoundError: No module named 'google.api'

It seems that the already known google folder is now missing in upload_data. But even when it is placed in upload_data, the script again fails with an error ImportError: cannot import name 'annotations_pb2' from 'google.api' (unknown location)

The problem is in line 15 of talk_service_pb2.py
from google.api import annotations_pb2 as google_dot_api_dot_annotations__pb2
But in google\api\ there is only annotations.proto

The problem is solved like this

py -m pip install -U googleapis-common-protos==1.5.10

(https://stackoverflow.com/questions/56365380/google-cloud-tasks-importerror-cannot-import-name-resource-pb2)

You get a new error

TypeError: Descriptors cannot be created directly. If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. If you cannot immediately regenerate your protos, some other possible workarounds are:

Downgrade the protobuf package to 3.20.x or lower.
Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

Which in turn is decided by the team

py -m pip install -U protobuf==3.20.0

You receive the following errorModuleNotFoundError: No module named 'yandex.cloud.speechsense.v1.analysis'. Something went completely wrong. And to understand what went wrong, you need to return to grpcio-tools.

There in the last parameter it is indicated с:\project\speech_sense\yandex\cloud\speechsense\v1\* and the resulting folder analysis I didn't get ready. I solved this problem by collecting the analysis folder additionally

py -m grpc_tools.protoc --proto_path=с:\project\speech_sense\
--python_out=с:\project\speech_sense\upload_data\analysis\ 
--grpc_python_out=с:\project\speech_sense\upload_data\analysis\ 
с:\project\speech_sense\yandex\cloud\speechsense\v1\analysis\*

and copying to с:\project\speech_sense\upload_data\yandex\cloud\speechsense\v1\ folder from upload_data\analysis\yandex\cloud\speechsense\v1\ manually.

Next we get the error usage: upload_grpc.py [-h] --key KEY --connection-id CONNECTION_ID --audio-path AUDIO_PATH [--meta-path META_PATH] upload_grpc.py: error: argument --key: expected one argument which I already mentioned earlier. Trying to use API_KEY set via SET like this

py upload_grpc.py --audio-path с:\project\speech_sense\upload_data\wave\123.wav 
--connection-id ID 
--key %API_KEY%

Gives an error message grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAUTHENTICATED details = "UNAUTHENTICATED: Unknown api key '%API****KEY% (0291C3E7)'"

I didn’t find a solution and simply entered the key on the command line not through a variable, but directly.
Next we get the error grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.INVALID_ARGUMENT details = "Connection in active state was not found: ID" debug_error_string = "UNKNOWN:Error received from peer ipv4:51.250.48.112:443 {created_time:"2024-06-21T13:50:40.1128112+00:00", grpc_status:3, grpc_message:"Connection in active state was not found: ID"}" It turns out that you don’t need to copy the project id, but the connection id.

Having specified the correct ID from the connection, we receive an error grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.PERMISSION_DENIED details = "User is not permitted to add talk in space SPACEID" debug_error_string = "UNKNOWN:Error received from peer ipv4:51.250.48.112:443 {created_time:"2024-06-21T14:01:07.0448397+00:00", grpc_status:7, grpc_message:"User is not permitted to add talk in space SPACEID"}" it looks like the service account does not have the right to add conversations, although the instructions
Добавьте сервисный аккаунт в пространство с ролью Data editor. Это позволит сервисному аккаунту загружать данные в созданное подключение.
It seems like I completed it a long time ago and successfully.

It turned out that the SpeechSense service does not pay any attention to the roles of the service account that were installed during its creation. When added to a space, a service account becomes an “Auditor”, even if its role speechsense.data.editor is specified in the roles upon creation. It is necessary to change the role in the menu of space participants.

With this, everything worked and upload_grpc.py successfully uploaded the file to the audio channel, all other documentation points are still working as written and you can start analyzing. It must be said that the STT conversion for this audio fragment was quite high-quality and there were no incorrectly recognized words at all.

I hope that my article will help readers avoid my difficulties when starting in this service.