Below we will talk about technologies for working with voice on the web, such as speech recognition and synthesis. This article will not include code examples or complex technical descriptions, my goal is to show you the capabilities of these technologies and their level of maturity using a simple application as an example, which was developed as part of the study of this topic. You can read the code of this application on github…
What Web Speech API gives us
Many will agree with me that the address on the maps and long searches on the phone are easier to enter by voice. A voice assistant often answers our requests. These technologies are used by many mobile applications, but how often do you see this on sites?
The benefits for mobile users are obvious, but not only for them. The Web Speech API increases the accessibility of a web application for people with disabilities and children. In some cases, the technology can also be useful for desktop users, for example, in web chats.
Try to open the link for an example of how the Web Speech API works in your Google Chrome (I’ll tell you about the reasons why I recommend this particular browser later). The page you opened is divided into two parts:
- Speech generation – try entering a phrase in the selected language and hitting PLAY.
- Speech Recognition – Press START, grant access to the microphone and start speaking.
Below you will see how the page converts your speech to text.
Have you checked? Okay, let’s move on to thinking about where all this can be applied in our web applications.
Finding the perfect patient
It is possible to implement support for the Speech API throughout the application, but will it be used? For speech recognition, for example, fields for entering large amounts of text are a good option.
In Severstal, despite the large number of services, we were able to highlight only a few of them where speech recognition would be really useful:
- Search on the internal portal
- Technical support request page
- Chat bots
Unfortunately, we did not find any use cases for speech generation, so I will be very grateful for your ideas in the comments.
Difficulties of implementation
Above, giving a link to a page with an example, I recommended opening it in Google Chrome. The fact is that this example works fine in Chrome on desktops and tolerable in mobile version, but other browsers do not have good support for this API. While speech generation works in major browsers, speech recognition support is extremely limited.
Since the demo was done on the knee, it will not work perfectly (or maybe it will not work at all) with those browsers that only partially support the Web Speech API or require additional code to be written. If you try to run it in Firefox, Edge or Safari, you realize that the technology is very crude and it is not a good idea to use it as it is.
Okay, we decided to implement the technology, but faced problems with its support, what should we do?
To begin with, it’s worth understanding whether you will use the Web Speech API in those browsers where it works, or completely abandon it. For ourselves, we decided that implementing an API that has weak support and may change is not a good idea, so we started looking for alternatives.
Pocketsphinx and Tensorflow. If you want speech recognition to work in your offline PWA (Progressive Web Application) applications, this is a great choice, but some languages may have problems with recognition.
Cloud services. Many cloud services have an API for working with speech. When choosing this option, we get excellent speech recognition and a monthly fee in addition =)
Own API. The most difficult and long way, which requires additional knowledge and hardware to deploy a fast and high-quality service.
What and when to use is up to you, I would recommend starting with cloud services to get your beautiful web applications up and running quickly. Also, if you want to play with the Web Speech API, you can use my code for any purpose.