How to embed a voice assistant into any mobile application. We disassemble using the example of Habitica

Don’t you think that many mobile applications would be much more convenient if they had voice control? No, this is not about having conversations with the bank assistant in the tech support chat. Basically, voice navigation through the application or form-filling in dialogue mode would be sufficient.

Using the example of Habitica (an open source app for fixing habits and achieving goals, written in Kotlin) Vitalya Gorbachev, a solution architect at Just AI, shows how to quickly and seamlessly integrate a voice interface into the functionality of any application.

But first, let’s discuss why voice control of a mobile application is convenient? Let’s start with the obvious.

We often need to use the application when our hands are busy: cooking, driving, carrying suitcases, during mechanical work, and so on.
Voice is an essential tool for people with visual impairments.

The cases are already transparent, but in reality everything is even simpler: in some cases, voice dialing just faster! Imagine – ordering an air ticket with one phrase “Buy me a ticket for tomorrow for two to Samara” instead of a long form filling. At the same time, with the ability to ask the user clarifying questions: in the evening or in the afternoon? with or without luggage?

The voice is useful when we go through the “form-filling” scenario and is convenient for filling out almost any long forms that require a certain amount of information from the user. And such forms are present in most mobile applications.

From left to right: Prigorod RZD app, FatSecret food diary (users have to fill out a form several times a day, choosing from hundreds of products), Korzhov bakery app.

Due to the fact that today voice assistants are often introduced into support chat and they develop from there, most companies are trying to push the functionality of the application into the chat. Top up the balance, find out something about a product or service … This is not always conveniently implemented, and in the case of voice input, it is completely counterproductive, if only because speech recognition often does not work perfectly.

The correct approach is to integrate the assistant seamlessly into the existing functionality of the application, in the interface of which the form will be filled out, so that the person can simply check that he said everything correctly and click OK.

We decided to show how this can be done using an example Habitica Is an open source application written in almost pure Kotlin. “Habitika” is perfect for a case with a voice assistant – here, too, in order to start a new task, you need to fill out a rather voluminous form. Let’s try to replace this dreary process with one phrase with leading questions?

I’ve split the tutorial into two parts. In this article, we will figure out how to add a voice assistant to a mobile application and implement a basic scenario (in our case, this is a ready-made scenario for clarifying the weather and time forecast – one of the most popular requests for voice assistants in the world). In the second article – and it will be released soon – we will learn how to call certain screens by voice and implement complex queries inside the application.

What you need to work

SDK. We took Aimybox as an SDK for building dialog interfaces. Out of the box, Aimybox gives SDK assistant and laconic and customizable UI (which, if desired, can be altered altogether). Moreover, as engines recognizing, synthesis and NLP you can choose from the existing ones or create your own module.

Basically, Aimybox implements the architecture of the voice assistant, standardizing the interfaces of all these modules and organizing their interaction in the right way. Thus, by implementing this solution, you can significantly reduce the time for developing a voice interface within your application. You can read more about Aimybox here or here.

Script creation tool. The script will be written in JAICF (this is an open source and completely free framework for developing voice applications from Just AI), and intents are recognized using Caila (NLU service) in JAICP (Just AI Conversational Platform). I’ll tell you more about them in the next part of the tutorial – when we get to using them.

Smartphone. For the tests, we need an Android smartphone, on which we will run and test Habitika.

Procedure

First, we fork “Habitika” (Release branch) and look for the files that are most important to us. I used the Android Studio IDE:

Find MainActivity.kt – we will embed the logic there.

HabiticaBaseApplication.kt – there we will initialize Aimybox.

Activity_main.xml – embed the interface element there.

AndroidManifest.xml – the entire structure of the application and its permissions are stored there.

According to the instructions in the Habitiki turnip, rename habitica.properties.example and habitica.resources.example, removing example from them, start the project in firebase for the application and copy the google-services.json file to the root.

We launch the application to check that the assembly is working. Voila!

First, let’s add the Aimybox dependencies.

implementation 'com.justai.aimybox:core:0.11.0'
    implementation("com.justai.aimybox:components:0.1.8")

in dependencies and

    maven { url 'https://dl.bintray.com/aimybox/aimybox-android-sdk/' }
    maven { url "https://dl.bintray.com/aimybox/aimybox-android-assistant/" }

in repositories.

And add the following line right after compileOptions so that everything works correctly

    kotlinOptions {
        jvmTarget = JavaVersion.VERSION_1_8.toString()
    }

Now permissions.

Removing flags from permissions RECORD_AUDIO and MODIFY_AUDIO_SETTINGS in AndroidManifest.xml so the options look like this.

Now let’s initialize Aimybox in BaseApplication.

Add AimyboxProvider when initializing the class.

And we do the actual initialization.

private fun createAimybox (context: Context): Aimybox {
        
        val unitId = UUID.randomUUID().toString()

        val textToSpeech = GooglePlatformTextToSpeech(context, Locale("Ru"))
        val speechToText = GooglePlatformSpeechToText(context, Locale("Ru"))
        val dialogApi = AimyboxDialogApi(
                "YOUR KEY", unitId)
        
        return Aimybox(Config.create(speechToText, textToSpeech, dialogApi))
    }

Instead of YOUR_KEY, your code from the Aimybox Console will subsequently be.

Now we embed the snippet in mainActivity.kt. Pre-insert the FrameLayout in activity_main.xml, right below the frameLayout with id bottom_navigation

In MainActivity itself, first add an explicit permission request to OnCreate

        ActivityCompat.requestPermissions(this, arrayOf(android.Manifest.permission.RECORD_AUDIO), 1)

And when you receive them, add a fragment to the above frame.

    @SuppressLint("MissingPermission")
    override fun onRequestPermissionsResult(
            requestCode: Int,
            permissions: Array,
            grantResults: IntArray
    ) {
        val fragmentManager = supportFragmentManager
        val fragmentTransaction = fragmentManager.beginTransaction()

        fragmentTransaction.add(R.id.assistant_container, AimyboxAssistantFragment())
        fragmentTransaction.commit()
    }

Do not forget to add to OnBackPressed the ability to exit the assistant after entering it.

        val assistantFragment = (supportFragmentManager.findFragmentById(R.id.assistant_container)
                as? AimyboxAssistantFragment)
        if (assistantFragment?.onBackPressed() != true) {
            return
        }

In addition, add to styles (styles.xml) in AppTheme

@style/CustomAssistantButtonTheme
        @style/CustomRecognitionWidgetTheme
        @style/CustomResponseWidgetTheme
        @style/CustomImageReplyWidgetTheme
        @style/CustomButtonReplyWidgetTheme

And the individual styles are just below:

Let’s check if a microphone has been added. We launch the application.

We got a bunch of errors about incorrect syntax. We fix everything as the IDE advises.

Works!

But the microphone is creeping into the bottom navigation. Let’s raise it a little. Add to the styles above in the CustomAssistantButtonTheme:

        72dp

It’s better!

Now let’s connect an assistant there and check if he answers normally. For this we need the Aimybox console.

Let’s start by going to app.aimybox.com under our github account, make a new project, connect a couple of skills (I connected DateTime for the test) and try to ask the appropriate questions in the assistant. Here in the settings, in the upper right corner, we take the apiKey, which we insert into createAimybox instead of YOUR KEY.

private fun createAimybox (context: Context): Aimybox {
        
        val unitId = UUID.randomUUID().toString()

        val textToSpeech = GooglePlatformTextToSpeech(context)
        val speechToText = GooglePlatformSpeechToText(context)
        val dialogApi = AimyboxDialogApi(
                "YOUR KEY", unitId)
        
        return Aimybox(Config.create(speechToText, textToSpeech, dialogApi))
    }

Works!

English text only, let’s change the welcome message in strings.constants.xml.



    
    UserID
    APIToken
    https://habitica.com
    "Привет! Чем могу помочь?

Hooray!

Here link to the code repository.

In the next article about an assistant for “Habitika” I will tell you how to use your voice not only to find out the weather, but to control the application directly – navigate through the pages and add habits and tasks.