“Alexa, set the alarm for eight”, “Alexa, play the movie oppenheimer” or “Alexa, tell me what the weather will be like during Easter.” All of these interactions with the smart speaker are recorded and are available to any user who requests them from Amazon. That’s what criminologist María Aperador did. Her surprise was to discover that some audios were not preceded by the activation word, “Alexa”, and so she reported it a few days ago in a video on TikTok and Instagram that has gone viral. How is this possible?
Amazon’s policy is clear on this: no audio is stored or sent to the cloud unless the device detects the wake word. This is confirmed by the company. And they add that the user will know when Alexa sends their request to the cloud by a blue light indicator or a sound from the speaker.
With this in mind, David Arroyo, a CSIC researcher specialized in cybersecurity and data, offers an alternative: “The system they have is only activated when someone says the activation word. But, for various reasons, it can have false positives. What we would have to see there is to what extent it is robust against elements that are disturbing what the interpretation of that activation word is.”
Voice interpretation machine learning systems, such as those used by Alexa or Google or Apple speakers, incorporate disparate elements to improve their operation. But, still, it is not an easy task. “These systems are designed to identify everything that are elements of variability due to pronunciation,” says Arroyo in reference to the different accents and ways of speaking, but also to changes in the resonance or reverberation of the room in which it is located. . the device. “It would be necessary to know in detail what the precision and false positive rate of the algorithm that Amazon uses specifically has.”
EL PAÍS has spoken with María Aperador to learn a little more about the recordings, which last around 6 seconds. They are fragments of casual conversations, of her or of people who were in her house. The criminologist has not reviewed the more than 500 audio files that Amazon sent her, but in about 50 that she has listened to she found two in which there was no activation word.
A study by researchers from Ruhr University Bochum and the Max Planck Institute for Security and Privacy highlights the importance of accidental activations in smart speakers. After analyzing 11 devices from eight different manufacturers, they published information on more than 1,000 involuntary activations. “We are talking about voice recognition systems, which depending on how they are implemented, can work better or worse,” says Josep Albors, director of Research and Awareness at the cybersecurity firm ESET Spain, about the possibility of false positives.
How to detect speakers wake word
To activate when they hear the word Alexa or the phrases “ok, Google” or “Hey, Siri”, smart speakers have a system that permanently tracks that term. “In the end they are devices that are constantly listening. But this is also done by smart phones or many intercoms. It is not exclusive to Alexa,” says Albors.
Arroyo also makes this assessment. “When you put the speaker on standby activated, that means that he is at all times absorbing what you are talking about. It doesn’t record it. But the algorithm is processing it, because it has to see what words are being spoken.”
This is an algorithm that works locally, on the device itself, searching for the acoustic patterns corresponding to the activation word. Amazon sources point out that their technology only relies on information from sound waves to detect the term. In addition, they highlight that the speaker also allows you to activate it with a button, which will prevent sound monitoring. For recordings that occur when the device is activated, users can choose not to store them in their privacy settings.
What’s the problem with this permanent wake word tracking? The two cybersecurity specialists agree that, if the sound is processed to extract data beyond the keyword search, the privacy problems will be very serious. But they also agree that there is no evidence that this is the case. “There are many interests for this not to happen, because it would mean the loss of confidence in all the devices and a very considerable economic damage for these companies,” says Albors.
You can follow EL PAÍS Technology is Facebook and x or sign up here to receive our weekly newsletter.