Voice-driven digital assistants such as Amazon Alexa, Google Home and Apple’s Siri are becoming commonplace in homes and pockets around the world. Their adoption is accelerated by just how natural and intuitive it’s becoming to use voice commands to control devices — whether it’s to play music, order a takeaway or find the quickest route to work. According to Mintel, 62% of Brits say they’d already feel comfortable using a voice-operated device — while Comscore predicts that 50% of all internet searches will be conducted by voice by 2020.
This technology has the potential to change the way we engage with brands online. While we’re already seeing the emergence of creative voice-activated services, campaigns and experiences — from Domino’s Easy Order to The Snow Fox by AKQA — many brands are still figuring out how voice can be harnessed to create more meaningful interactions with their audiences. And while developments in the technology industry are making the process smoother and more user-friendly, there’s still work to be done to alleviate people’s concerns around privacy and security.
Last Friday morning, we welcomed a panel to Shoreditch House to help unpack this topic — exploring some of the challenges and implications of voice activated devices, sharing best practice examples and looking ahead to what the future might hold.
Guests on the panel were:
Francesca Cuda — Head of Engineering at Digital Product studio ustwo, Francesca works closely with clients to help them achieve their goals for new products and services. Together with her team, she’s worked on a variety of exploratory voice-related projects, from internal hackathons to concierge concepts for the likes of Wayfindr and LUSH Cosmetics.
Heather Andrew — As CEO of neuromarketing research company Neuro-Insight UK, Heather shared insights on the cognitive impact of voice interaction. With a background in marketing and communication, she was formerly marketing director at Rowntree Macintosh, and has worked as a consultant at PwC and OxfordSM.
Duncan Anderson — Duncan is CEO of Humanise.AI, a startup in the artificial intelligence space that aims to transform customer service through the use of AI bots, chat systems and voice interaction. Prior to founding Humanise.AI, Duncan was the European Chief Technology Officer for IBM’s Watson AI business.
Here’s what we learned.
Interest in voice AI is growing fast
Francesca from ustwo opened the discussion by talking about the interest and demand for voice A.I. from the studio’s clients, and how the use of the technology is projected to grow rapidly in the future. She told us, “more and more we are seeing how clients are coming to us and asking what their impact should be for users regarding voice. They want to explore what voice means to them, and most importantly, how we can have a meaningful impact to people through voice”.
Revealing the the latest stats on voice assistants, she shared that “500 million people have used, or are using, voice-enabled digital systems,” and that this number is only going to grow in the coming years. She cited two key reasons: voice has specific consumer benefits, and voice has unique qualities that other digital assistants lack.
Voice provides a quicker, easier, more personalised way to communicate
From the perspective of the consumer, Francesca highlighted three key benefits for using voice. Firstly, speed. In one minute, “a human being can speak around 150 words, but can only type 40” — so voice offers a much faster means of communication. Secondly, it’s easy and convenient. Using voice is intuitive and hands-free, and there is no manual or person needed to teach people how it works. Thirdly — and most importantly, in her opinion — voice A.I. is context-aware. “It doesn’t only understand your current question, but it also keeps track of your conversation so far, the location you’re in and your settings,” in order to formulate responses that are tailored to each user’s needs.
The accuracy of voice tech is improving fast
One of the additional qualities that is unique to voice-enabled technology, and leading to increased adoption by consumers, is demonstrated by the latest research on language and processing, and its evolution. Statistics published by Google show that, in 2010, voice technology had about 70% accuracy for voice and words being recognised, and that there was also a lot of progress surrounding cancelling the noise around the human, and understanding of the accent and tone of the speaker’s voice. In 2016, Google said their accuracy had gone up to about 90%, which Francesca added is a “stunning result”.
Voice tech doesn’t have to be expensive
Francesca added that, from a production point of view, voice has quite a small production footprint. She explained: “A voice-enabled device like an Alexa or a Google Home; you have a processor, a wifi, a microphone, a speaker, then the actual container of your voice assistant — so in terms of producing that on a big scale, it’s not that expensive.”
Voice eases the cognitive load
Last year, Neuro-Insight UK conducted research with JWT Intelligence and Mindshare Futures into how people respond to voice, as part of the Speak Easy insights report. One of the findings of the report showed that voice can ease the cognitive load, explained Heather from Neuro-Insight UK, meaning that the brain needs to work less hard when communicating in this way. Voice can give a simpler level of information and fewer choices, which is ideal for brands looking to provide a seamless user experience. The results also found that comfort grows quickly over time. A user of a voice-enabled device can quickly become comfortable speaking aloud to a computerised system, sometimes as soon as they receive their first response back from the device.
Brands can provoke a more emotional response using voice
Another finding from the research, shared Heather, was that voice can provoke more of an emotional response in the human brain. Users had a much stronger emotional response to information about brands when interacting through voice than via text information. Heather added that voice can be used to strengthen brand iconography where text can not, and this is even further strengthened when the message can be seen and heard at the same time.
Duncan from Humanise.ai added that voice technology must be built with a different approach to a text-based system, as the two offer very different results. He gave the example of Humanise.ai’s development of a restaurant finder service, and overcoming the obstacle of providing a list of restaurant options through a voice-enabled device. A limitation of using voice-enabled device to share a list of results is that “by the time it’s got to the third or fourth option, you’ve forgotten what the first one was”. One of the ways they addressed this issue for their restaurant finder involved finding out more about the consumer and their needs, in order to significantly hone down the results.
Brands need to learn to “paint a picture with words”
A second obstacle Duncan and his team were faced with involved using voice to offer the consumer ‘inspiration’. He said, “when you have a graphical interface, you can look at a picture and be inspired by the location – you can really visualise, and the picture really inspires the brain. When you take away the screen and the image, how do I inspire you about the restaurant?” With voice, he explained, you have to paint that picture with words, which is a lot harder to do. More work needs to be done and skills must be adopted when it comes to voice development, in order to create visual images with vocally spoken words.
Blending voice and visuals opens up new possibilities
Duncan also discussed the difficulty in building a voice system that works effectively within a social context. He gave the example that, once the device has given restaurant options to the user, and they in turn have had a conversation with someone nearby over which they’d like to pick, the voice assistant will be listening to that conversation —and assuming that what has been said as part of that discussion is their final choice. There is plenty of opportunity for a voice-enabled device to become confused in these kinds of situations. Devices such as the Amazon Alexa or Google Home, for example, are designed for commands, and are “not really devices where you want to discuss the information that’s coming back to you”.
As a result, Alexa Show was built to include a screen, allowing the user to turn off the mic after being presented with information. “That completely transforms the usability of this service, because we could present options on the screen, allow the user to turn off the mic, and then wait for them to select one of the options on the screen in order to move forward.” Duncan concludes: “as the capabilities of these things mature, as we get more familiar and they become more embedded in our everyday lives, my bet is that we will move towards devices that have small screens as well, and that really opens up the possibilities.”
Three final pieces of advice…
The three panelists ended the discussion with advice on how companies and brands should use voice technology to both theirs and their consumers’ advantage. Heather emphasised the use of sound as part of a brand’s iconography, in order to leave an emotional and longer-lasting impact on the consumer’s brain, and advised the audience to think about how to make sound and vision work together coherently.
Duncan placed importance on thinking deeply about the product, the brand and the consumer’s needs when considering whether and where to use voice. He emphasised the idea that voice needs a specific function, and to be connected to the real world and not just the digital sphere. It needs to contribute something worthwhile to the user’s experience, rather than just being created as a trivial or trendy addition to the brand’s technology. He added that creative thinking and skilled use of words are also necessary to overcome the current limitations that are often found when developing voice technology.
Francesca talked about her interest in the idea of ambient computing, and suggested that others interested in utilising a voice-enabled device into their brand or business consider this philosophy of creating a ‘thinking voice’ that is an invisible element in the life of the human being. She said it should be inclusive to people of all backgrounds and incomes, and not just understood and used by people from the tech industry. She invited potential and existing developers to consider how can we embed that philosophy into our everyday life, and not just for the sake of technology.
All photography David Townhill.