Google Cloud Text to Speech API as the quintessence of Light. Reminiscences

So. The essence and the main logic of the application are enclosed in a convenient client Google Cloud Text to Speech API, made on the basis of the Ruby on Rails 6 framework (brand new Bootstrap 5, Webpacker, jQuery, spectacular Ajax Flash Messages) and providing the possibility of high-quality, based on Google technologies (WaveNet) voice acting of the text. Additional features include on-the-fly translation from the Google Translate API and transparent storage of sound files in Amazon ES3 or other compatible (Google Cloud Storage, Microsoft Azure Storage) storage. Plus some nice little things: SSML support, arbitrary naming and various generated audio formats, managing storage files from the application’s web interface, here you can also download or just listen to sound without downloading it to your PC. Plus a wide variety of options and triggers (see screenshot) that affect the sound quality.

WaveNet is a convolutional neural network where layers have different dilatation factors and allow its receptive field to grow exponentially with depth and cover thousands of time intervals.

During training, the incoming sequences are sound waves from voice recording examples. After training, you can use the network to generate synthetic phrases. At each sampling step, the value is calculated from the probability distribution calculated by the network. This value is then returned to the input and a new prediction is made for the next step. Creating samples in this way is quite a resource-intensive task, but it is necessary to generate complex, realistic sounds.

But, perhaps, the main thing is different. The highlight of the project is that you don’t need to be a programmer to work with this application; there is also not the slightest need to break and hack anything, providing yourself permanently free use of the above services. The description of the code contains a detailed step-by-step guide, after reading which, almost any non-programmer will be able to deploy Google Cloud Text to Speech Pro onRails on the free plan of the Heroku PaaS cloud platform (you don’t need to install Ruby / Rails on your work computer, Git and Heroku scripts will suffice). You will have a new website in the third (from herokuapp.com) level domain, where you will become the only authorized user, Google will provide quite solid free limits for using the Text to Speech API and the Translate API (unless you are going to voice “War and Peace “, this should be enough), Amazon will give your new account a whole year of free service.

At the end of which … hmm, it’s not for me to teach you, you yourself will guess what to do. It seems to me that this loophole does not constitute any secret for the managers of Amazon, but they just adhere to approximately the same philosophical concept and somewhat similar views of the World, with a brief description of which I began this small presentation material.


Yes, I almost forgot. View code / links to description and quick start / download here… To thank the author for a thoughtful article and a useful product, swear at shit code or even splash out just like that, under the impression of the past “elections” in the State Duma and / or hatred for Amazon as the quintessence of capitalism, and for its deliberately paid foreign PR agents – further in the comments …

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *