What You Need to Know When it Comes to Voice

Ariba Jahan

“Hey Siri, what’s the weather this morning?” is something we say out loud to our phone out of habit in a world where we used to be amazed by a Blackberry bringing us the internet. Voice assistants or Voice user interface (VUI) have risen so quickly, we already have four big players: Apple’s Siri , Microsoft’s Cortana , Amazon Alexa and Google Assistant. According to Comscore, by 2020, 50% of all searches will be done by voice – the perk of having hands-free and eyes-free technology.

Machine learning and AI technology have enabled a brand new experience for us, which means that even the best design principles that work for visual design can’t help us here. So how do you design for conversation? Here’s what the pros have to say about designing for voice.

Who’s the speaker? What do they need?

Users of voice, also known as “speakers,” are instant experts. We have a natural way of saying things and have our own unique vocabulary to articulate those thoughts and inquiries. Because it is so familiar, we have a low tolerance for errors and an innate expectation that our interaction should be exactly like our other conversations with human beings. As users of voice, our hands and eyes might be busy; we’re multi-tasking and we may be in a private space.

Speakers have a goal in mind — whether that’s to check the weather quickly, order vitamins, look for flight options while in bed or listen to that one song we can’t get of our head. We’re either trying to accomplish a task, be entertained or get enough information to make a decision. What’s most important in this case: Does the voice assistant allow easy access to these things (even if it’s through a third party) and does the voice assistant deliver what’s needed in a natural, easy-to-understand manner?

Who’s the machine?

The other thing to keep in mind is the brand or voice skill (or action if you use Google assistant) coming through the device and into that personal moment. As soon as the speaker is engaging with a voice assistant, it becomes a being to the speaker and is no longer a “piece of technology.” What should your speakers feel, think and expect from engaging with it? According to Daniel Padgett (Conversational Design Lead at Google), if you don’t define your voice assistant’s persona, the speakers will do it for you and it may not be what you want. At SXSW 2018, Padgett said that if you’re a brand designing your voice assistant skill/action, the best thing you can do is to give that character a backstory and make sure it manifests the brand attributes.

A great example of this is Eno, Capital One’s AI chatbot. The creators spent a lot of time figuring out who is Eno? At SXSW Audra Koklys Plummer, head of AI design, explained that Eno is a genderless AI virtual assistant (chatbot) who is fascinated with the human experience, so it loves dad jokes, watches The Bachelor, devours books and asks questions. Eno doesn’t like when people are mean, so instead of being passive to abusive conversations, it might give a response like “that’s not cool to talk to me like that, let’s stick to talking about money.” Eno’s personality and continuous learning is the “secret to its 4.7 star rating.”

What did you mean by that?

Thanks to technology the challenge of recognizing words people are saying has been conquered. But the ongoing challenge is making sure the interaction models a natural conversation, where the voice assistant understands the speaker’s input quickly and can answer in a simple, comprehensible way.

Google’s Padgett suggests that conversation designers should give users just enough information for them to make that immediate decision and then provide them with more information to proceed to the next decision.

On the flip side, the voice assistant has to understand what the speakers really mean by their words, beyond recognition of text and language. It has to leverage its ability to learn, ask users clarifying questions and use contextual data to eliminate ambiguity. This is how a voice assistant can eventually understand what a speaker means when they say “Play Yesterday:” Did you mean the audiobook? The movie? Song? Which song?

What’s the Future of Voice?

According to Padgett, we need to pay attention to the evolving continuum of the human experience from TV to mobile to voice –the future is somewhere in between.  He mentioned that the future of voice is augmented – perhaps starting the interaction over voice and then driving the user to another device for the rest of the answer, like a voice assistant with a display. He mentions that this augmented future will create a  “glanceable experience,” so you can go about doing what you’re doing and use the information you need in a seamless manner.

There are still plenty of challenges on the table. As voice assistants are expanding the number of languages they recognize and speak, the skills and actions being built also have to support those languages. Another huge challenge is just the conversation mechanics. If you want Alexa to do something based on a skill, you can’t say “Alexa, do X”, you have to say “Alexa, ask skill Y to do X.” This may not be so clear to everyone, nor is it natural. If you’re interested in learning more about designing conversations, check out Google’s toolkit or Amazon’s toolkit.

But overall, there’s no denying the powerful role voice assistants have started to play in our lives, and only time will tell how different our lives will be as voice assistants continue to develop.