Techniques that can hijack “always listening” features.

Electronic Voice Image

by Charles Patterson

The introduction of voice recognition to numerous devices and phone apps adds a new security element to consider. Google Now (“OK Google”), Siri, Cortana, and the Amazon Echo are working their way into our lives. Can voice control be surreptitiously used for espionage, eavesdropping, or security attacks? Here I look at two types of research being done. One using distorted audio and the second using radio waves to take control of a phone.

Always listening

At a recent talk I attended, a few cell phones had gone off during the presentation. The speaker apologized to the crowd, “Sorry for the interruptions…”, as soon as he said the word “sorry”, though, someone’s iPhone woke up and said, “Sorry, I could not understand your request”.  The phone had thought the presenter said “Siri” and tried to interpret his sentence as a command.

YouTube videos and TV shows can easily control devices in your home if the right words are spoken. My Amazon Echo wakes up whenever a Lexus commercial comes on (the Echo control word “Alexa” sounds like “a Lexus”).

What is the danger of devices being manipulated or controlled by unauthorized sounds or voices? There are a number of concerns to be aware of. Eavesdropping is one – a phone could be triggered to dial an eavesdropper’s number, allowing them to listen to conversations in the vicinity of the phone. Other security concerns could include cost abuse (such as ordering products or dialing toll numbers – a common phone hacking technique), tracking, initiating malicious apps, and more.

 

Manipulated audio

A recent blog post by Naked Security brought some new research to my attention. Researchers from UC Berkeley and Georgetown U found that Google Now could interpret commands even though the audio had been severely distorted.

Their research is presented at www.hiddenvoicecommands.com along with audio samples.

The sounds the researchers created, though, more resemble a possessed Linda Blair in the Exorcist.  When you hear the sounds it is not that hard to recognize the words since you know there are words being spoken. If, though, the sounds were concealed by other noises in the background they might not be as easily recognized.

The following audio clips they refer to as “black box” commands, phrases that may be difficult to understand, yet can be used against existing voice recognition systems. They also developed what they referred to as “white box” commands which are less intelligible by humans but can work when the attacker has full knowledge of the voice recognition system and can create proper sounds to manipulate it.

 

 

In the first example, I can clearly hear the words “OK Google”, and my phone was able to recognize it as well (I placed my phone near the computer speaker). The second command says “turn on airplane mode”. My phone responded properly and opened the airplane mode options (but luckily let me control the mode manually).

Note, though, before you think these commands were simply too obvious, there is an issue known as “priming” effects- when you know what words to expect, you hear it more clearly.  The audio clip below helps demonstrate that:

 

The researchers explain that in the “black-box model the adversary does not know the specific algorithms used by the speech recognition system.”  While in the white box version “we assume the attacker has complete knowledge of the algorithms used in the system and can interact with them at will while creating an attack. We also assume the attacker knows the parameters used in each algorithm.”

Listen to their “white box” commands here.

Their full paper on Hidden Voice Commands can be found here.

 

Covert audio manipulation via radio transmission

A more covert system of attack was developed by researchers, Jose Lopes Esteves and Chaouki Kami, and presented in Paris last year.

They developed a technique of transmitting audio via radio waves into a smart phone. The phone had a headset plugged in, the wire of the headset acted as an antenna. It picked up the radio signal and fed it’s wave forms into the microphone circuitry in the phone. The signal was an AM transmission that was able to induce sine waves, representing audio, into the headset wire.

Audio injection into headset wire.

Audio injection into headset wire. (Esteves and Kami)

Esteves and Kami summarize their presentation:

Voice command allows the hand-free use of a mobile device for texting, calling and application launching. This way of interacting with the mobile devices is spreading and will certainly be one of the main improvements in the upcoming UIs. Today, a lot of features can be accessed by voice, depending on the device and the operating system. Some of them can be critical from a security point of view. On can cite placing phone calls, sending text messages, publishing and browsing the internet or even changing the device’s settings. As voice is the medium for launching commands, it is assumed that the victim would hear the attacker’s voice, so that the attack vector is generally unrealistic.

In this presentation we show another way to remotely trigger voice commands on a mobile device. In fact, the use of headphones with a mobile device is a typical use for music listening, hand-free calls, FM radio reception, etc… We will explain how we managed to quietly inject voice commands remotely by involving smart intentional electromagnetic signals. Along with the technical details of the attack scenario we will provide an analysis of the attack surface and some adapted countermeasures. Finally, several demonstrations will be proposed as a proof of concept.

Their presentation is about 40 minutes long. The clip below begins about half way through, after they have gone through much of the technical background.

 

Countermeasures they recommend include:

  • Unplug headphones when not in use.
  • Use microphone-less head phones for music.
  • Only enable voice commands when needed.
  • Use personalized keywords.
  • Customize which commands are available.
  • Enable feedback for phone operations (sound, vibration, etc.)

Slides from their presentation can be found here.

As these techniques develop, we will need to consider new countermeasures as well as more manual use of the apps in your phones and other devices, sacrificing convenience for security.