You’re invited to AW2020, Advertising Week’s digital event, September 29-October 8 to help work through solutions to some of the advertising and marketing industry’s biggest problems. From climbing unemployment to racial inequality and an unclear future, now is the time, more than ever, to think and work together. Register to learn more.
Understanding the difference between Voice Controlled Devices, Voice Assistants and Artificial Intelligence, and the inherent power contained within.
Voice enabled assistants and home automation are all the rage. Amazon’s Alexa, Google Assistant and Apple’s Siri are all embedded within devices sitting on your wrist or kitchen counter, able to control your home, take notes for you or order you lunch. Which means now is a great time to start building something which will enable you to take advantage of these growing voice technologies.
But first things, first. It’s important to understand the differences between Voice Controlled Devices, Voice Assistants and Artificial Intelligence, because they aren’t the same thing. We often hear the terms used interchangeably. This shows an inherent misunderstanding of the technologies at hand. First, let’s clarify the differences between these technologies.
What’s the Difference?
Voice User Interfaces (VUI) are used for both Voice Controlled Devices (VCD) and Voice Assistants, but there is a fundamental difference between the two. Not all Voice Controlled Devices are Voice Assistants, but all Voice Assistants are controlled by voice. You can own a VCD that is simply that, a device controlled by your voice. In this instance, voice is simply an INPUT method, a controller akin to a mouse or a keyboard. All you’re able to do is change states of the content or app you’re using on the computer. Voice Assistants are the next step in computer intelligence, able to store, keep track of and respond to information.
At the heart of user-computer voice interaction are four simplistic parts: a user, an input device (microphone), an output device (speaker and/or screen) and a simple computer (CPU). You use the microphone as your input, the CPU for understanding that input and then the speaker and/or screen as the return, human recognizable response to the changes in the CPU. A Voice Assistant builds upon these same components to add a more capable CPU that can store data and perform simple computational tasks. Voice Capable Artificial Intelligences take this further by incorporating a computer that learns and adapts, even capable of interacting with the user proactively.
Here’s a fun, Disney-inspired way to think about things, with these technologies as everybody’s favorite wooden boy—Pinocchio. Prior to a magical intervention, Pinocchio’s only movements are dictated by Geppetto pulling his strings, much in the way a simple VCD works. After the fairy turns Pinocchio into a “real boy,” he’s able to think for himself, rather limitedly and with the supervision and guidance of his conscience, Jiminy Cricket. This is like a simple Voice Assistant, able to carry out tasks, connect to other apps and services and retain some knowledge but needs human guidance. By the end of the story, Pinocchio becomes a flesh and blood child, thinking and moving completely of his own accord without restraint, much like a robust artificial intelligence would.
To put it even more simply, we can define these as a set of systems that build on top of one another, according to the following spectrum:
An expanded view on how the Voice UI Cycle works
Let’s start with our four components to VUI: user, input, output and computer.
The computer, in current voice assistants, is in “the cloud” and therefore able to perform very fast, hefty computations. Provided all dependencies are working nominally, offloading thinking to a larger, more capable, off-site computer is certainly powerful. The real (broad, real world, human) experience power comes into play when combining an artificial intelligence with hardware then content services. In other words, we talk to a device to ask a computer to turn on the hardware and perform an action to change some content.
Imagine you’re sitting on the couch at home and you decide you want to watch Stranger Things. You say to your Echo Dot, “Alexa, tell Fire TV to play Stranger Things.” Like magic, your television turns on and starts playing episode one.
This flow looks like this:
user → device (echo dot, microphone/speaker) → computer (Alexa service) → hardware (TV) → content service (Netflix) → action (play, pause, etc.)
This same idea works when you’re cold and you want to turn the temperature up:
user → device (Google Home) → computer (Google Assistant) → hardware (Nest) → content (temperature) → action (make warmer)
Sometimes, as in the Nest example, hardware has its own AI. Over time, the Nest learns your routine and sets temperature accordingly. Note that this example has one more link in the chain—the heater / A/C hardware in between the Nest hardware and the ambient room temperature, which can operate independently from Google Home, and in doing so, is then its own chain.
Becoming more powerful
Eventually, VUI enabled AI will be even more powerful when the flow can be reversed, proactively, so that a computer can constantly monitor the environment then notify the user when it changes. Keep in mind that this is a huge change in how we think about interacting with computers, meaning less upfront cognitive strain on the human user, delayed until later when it’s necessary.
That proactive flow looks like this:
content → hardware → computer → device → user
This already happens often. Let’s imagine a hospital setting. A patient is sitting in a bed, connected to a heart rate monitor. That monitor prints out data and then, if an issue arises, remotely notifies the nurse down the hall by sounding an alarm.
Here’s how that flow looks:
patient’s heart rate (content) → the patient (hardware) → the monitor (computer) → print out and audible alert (output device) → nurse (user)
Again, remember that the difference between a VCD and a VA is the amount of intelligence in the CPU. This proactive computational awareness can analyze data and make its own determination as to whether to alert the human user based on predetermined thresholds.
What this means
It’s not a stretch to imagine how this could work with so many other aspects of our daily lives—not to mention the business applications. Embedding this incredible technology inside everyday items can prove transformative, when done intelligently. A workflow which can preheat your stove through a simple voice command? Give you a readout of your whole home’s energy consumption, then adjust temperature, lighting, hot water usage based on your preferences? These are only a handful of simple examples. Just imagine what else could be created!