At the I/O 2018, Google CEO Sundar Pichai announced a new cutting-edge feature off Google Assistant, that can even make phone calls on your behalf. Sundar even played a demo showing the ‘Assistant’ placing a call to a salon to book an appointment. The voice sounded natural and real, that the person at the other end had no clue they were talking to a digital AI Assistant.
“Today we announce Google Duplex, a new technology for conducting natural conversations to carry out ‘real world’ tasks over the phone. The technology is directed towards completing specific tasks, such as scheduling certain types of appointments. For such tasks, the system makes the conversational experience as natural as possible, allowing people to speak normally, like they would to another person, without having to adapt to a machine”, said Google’s blog post.
The Google Duplex technology is built to sound natural, to make the conversation experience comfortable. Keeping in mind that users and businesses have a good experience with this service, Google has ensured ‘transparency’ is a key part of that. The blog also states that there are “several challenges in conducting natural conversations as natural language is hard to understand, the natural behavior is tricky to model, latency expectations require fast processing, and generating natural sounding speech, with the appropriate intonations, is difficult”.
However, the technology, called as ‘Google Duplex’ is still under development phase, and has been well underway for a year now. Google plans to test it this summer, and CEO Sundar also said that the Assistant can react intelligently even when a conversation ‘doesn’t go as expected’ and takes another direction that the intended one. “We’re still developing this technology, and we want to work hard to get this right. We really want it to work in cases, say, if you’re a busy parent in the morning and your kid is sick and you want to call for a doctor’s appointment,” he said.
In spontaneous speech, people talk faster and less clearly than they do when they speak to a machine, so speech recognition is harder and there higher word-related error rates. The problem is aggravated during phone calls, which often have loud background noises and sound quality issues.
However, Duplex has a Recurrent Neural Network (RNN) at its core that is designed to cope with such challenges. The network uses the output of Google’s Automatic Speech Recognition (ASR) technology, as well as features from the audio, the parameters of the conversation (e.g. the desired appointment, or the current time of day), the history of the conversation etc.