We put a robot instead of a man behind the phone and almost broke everything

A couple of years ago, it seemed that the world should be taken over by robotic dialers. Because the digital age is on the doorstep, smart people promise a singularity in twenty years – and companies still have giant call centers with tons of people. Those do monotonous work of the same type, and analysts unanimously repeat – these are the first candidates for automation and the death of the profession.

The process seems to have begun – voice recognition has reached a new level. At its presentation, Google shows how a robot calls a restaurant, makes a reservation, but the person on the other end does not understand that he was talking to the robot. Corporations one after another release voice assistants – Yandex Alice, Tinkov – Oleg. The robots keep the conversation going and make jokes.

We also thought that putting a robot on the phone would be a great idea. But we understood – in fact, everything is not so cool.

It seemed that setting up a robot was a trifling matter.

We have free introductory classes at Skyeng. People sign up for them, but we know that some of them will change their minds. Therefore, for example, for 10 teachers in a slot, we write 13 students. The teacher is looking for a student closer to the lesson and calls up the enrolled people to get confirmation from them. Each time you need to call several contacts prompted by the system.

This ate up the time that the teacher could spend on a lesson or rest. And we decided – let the robot call!

Globally, robots are divided into two types: with and without machine learning. The first type absorbs data, learns from it, and gets better and better. The second type is simpler – it works on ready-made word recognition packages. The system hears the voice, recognizes, translates the audio into text, and the words are checked against the written text script. Roughly speaking, if the student’s answer contains the word “yes” – say one text, if “no” – another.

Our robot was just that – simple, without AI, with a built-in recognition package from a third-party contractor and a speech synthesizer.

The first robot was named Anton

For my taste – he had the most disgusting voice of the standard: male-squeaky-robotic.

With this voice, he reminded the day and time when the lesson would take place, asked if the student was ready for the lesson, and accepted two answers: “yes” or “no”, and then directed them to the appropriate conversation threads.

We ran a high traffic test and it just messed things up. There were many missed introductory classes. Then we started listening to recordings of conversations to understand what the problem was. She was on the surface – people hung up only after hearing the first words.

Naturally, there was little pleasant in the raspy voice of Anton’s robot (I told you so). We thought: now we will make the voice more beautiful, people will listen to the robot to the end, and the conversion will heal.

The unpleasant Anton was replaced by Vlad.

She was given the voice of a real person – an employee of Skyeng, a female operator from the sales department. It sounded amazing, very beautiful. Everyone liked it.

We did not rejoice for long. People did not fall away from the call immediately, but the number of transfers to the operator has multiplied. Because now, hearing the voice of a living person, not everyone understood that it was a robot. And they answered with florid phrases – as in a conversation with a real operator. And not short answers-commands that the robot would like to hear. Too detailed for our Vlada. The percentage of successful recognition dipped, which affected the conversion.

Another drawback of a live voice – Vlada could only pronounce pre-recorded samples. An ordinary robot synthesized any text, this gave us flexibility and the ability to quickly reconfigure scripts. And Vlada could not even speak when the lesson would take place. We have not recorded samples of all possible combinations of dates and times – there are a lot of them. She simply said, “Your lesson will take place soon,” which also caused confusion.

It turned out that the robot should not be too human

Then we took the third option – the robot Zakhar, not as nasty as Anton, but not as alive as Vlada. Zakhar is just normal. Not so annoying, but not misleading either.

Plus, after the experience with Vlada, we expanded Zakhara’s vocabulary. For example, we taught him to recognize popular answers – “SURE” and “NOT”. But this led to new problems. For example, the robot sent the word “OF course” to the branch “NO”. Because often when pronouncing in the word “of course” the syllable “NOT” was clearly heard. You can’t do anything with recognition. The dictionary didn’t help.

Even worse, when the robot made a false-negative conclusion, it replied: “Thank you, the manager will contact you.” The person thought that the introductory lesson was confirmed, and then the manager called him and offered to schedule the introductory lesson for another time. An awkward confusion ensued.

We got confused: we began to change the texts that the robot pronounced. Increased the number of forks in conversation. Added tone dialing instead of word recognition. We thought that this would make the answers more accurate, and this would raise the conversion in confirmation. The false rejection rate has really dropped. But everything else got worse. It turned out that even the season can spoil the metrics.

Hypothesis testing started with 5-10% of users and could go up to 50% before deciding to keep this change.
Hypothesis testing started with 5-10% of users and could go up to 50% before deciding to keep this change.

We tested the robot in the winter. Imagine a man walking down the street wearing gloves and hears a phone call. Answers through headphones and comes to a moment with tone dialing. He takes off his gloves, unlocks the phone, goes to look for the right number to answer. He makes a mistake and clicks in the wrong place, or pressing does not work, and he does not have time to answer.

Therefore, we returned voice recognition, but did not remove the tone dialing either. If the robot did not recognize the voice, we put a block with the following text: “I did not understand you, press 1 if you come …”

The number of errors really dropped to the previous level, but we began to answer “no” more often. And “yes” – less often. Which was also bad.

The main thing that we understood: he called himself a robot – simplify

The best results came when we started to tell people how to respond. “If you come to an introductory lesson, say yes.” We did not discover America, so many robots do. The person has heard clear instructions on how to respond, but this is also not a panacea.

We have built such a chain. Zakhar speaks first and gives instructions on how to respond. The student answers, and if the robot does not understand, a block with a tone-based set of answers, the shortest possible questions and instructions is connected. Then a second tone attempt is for confirmation. And if the tone is not exported, we transfer it to the operator.

So we finally reduced the burden on operators and got a strong decrease in the number of errors – but both “yes” and “no” answers grew. We left this option as we got an increase in the key end-to-end metric. This concludes the story with the robot test. It was long and painful for all the participants in the process.

Or maybe theirs, these calls?

We thought about this a lot during the experiment. There are no usual metrics in calls: clicks on buttons, page transitions, time spent on them. It is impossible to insert a widget into calls with a request to evaluate the convenience of the interface – there is only a human voice and ticking seconds of a conversation.

When you run a hypothesis, collect data and find out what is performing poorly, it’s frustrating. I just want to take it and never deal with any calls again in my life. Just throw them away as a phenomenon and transfer everything to interfaces.

But that doesn’t work. The day that Apple will release an iPhone without calling functionality is unlikely to come soon, if ever. No matter how smarter our devices are, no matter how many cores there are in their processors, and no matter how many cameras there are on the back panel, these are still phones that people use to call each other.

What will happen next is generally unknown. Today interfaces seem simpler and clearer, and tomorrow children who have grown up talking with a speaker will again return to the voice. And then to something else.

But while calls are still a part of the world, they need to be automated so that everyone is comfortable and good. Through thorns and raspy voices.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *