Voice-first devices are everywhere: Smart earbuds and smartwatches, voice-activated TV remotes, smart speakers, automotive infotainment systems, and so much more. Though we appreciate the convenience of devices with a voice user interface, we also pay a price: all-too-frequent recharging of these devices. We love our voice-enabled devices, but we don't like having to constantly recharge them - and we're not willing to sacrifice system performance for longer battery life. No one wants a long-lasting battery in their smartwatch if they must repeat a keyword multiple times before their device wakes up and follows commands. Here’s where consumers—and the manufacturers that supply them—want the same thing, and that’s long battery life and high accuracy, which are intimately related.
A key element for maintaining keyword accuracy is preroll: the ~500ms of sound that happens BEFORE the keyword, which helps to establish a baseline of the ambient noise. Analyzing this sound data is a critical first step in enabling the WWE to accurately distinguish keywords. If the WWE doesn’t analyze preroll, it loses wake word accuracy.
In a traditional digitally based voice-first architecture, the preroll is handled within the WWE after the sound data have already been digitized. In newer, lower-power system architectures that rely on analog sound data to activate the WWE, how can we manage preroll to maintain wake word accuracy?
To answer this question, it’s important to understand that there are two different ways to trigger system wake-up in the analog domain: In the first, standard analog circuitry triggers wake-up based on energy (Analog Acoustic Activity Detection
); in the second more sophisticated method, the analogML™ core
triggers wake-up based on the detection of voice (Analog Voice Activity Detection
Acoustic activity detection
in the analog domain is carried out by standard linear analog circuitry that triggers digital system wake-up when the microphone input exceeds a certain energy threshold. Over the last few years, we’ve seen different ways to implement this solution, including both dedicated chips and ones that integrate a MEMS microphone. The biggest problem with these chips is that they can’t determine the type of sound that’s coming into the system, wasting significant power on wakeups from random loud sounds that have nothing to do with the wake word. And because standard analog circuitry can’t support preroll collection and delivery to the WWE, when keywords spoken at the time of system wake-up are more likely to be missed.
In contrast is voice activity detection
, carried out by the analogML core, that uses nonlinear analog circuitry to perform inferencing in the analog domain and enable the detection of voice or other specific sounds from raw, analog microphone data. The analogML core only triggers the WWE when voice, the only data that can contain a keyword, has been detected. Because the analogML core is highly discriminating among sounds, the number of false wakeups is very low, maximizing the amount of time that the digital system remains asleep (and minimizing system power usage).
And most important for the system accuracy that we’ve been talking about, the analogML core leverages Aspinity’s patented analog data compression technology, which continuously collects and compresses analog sensor data (preroll) for low-power data buffering in just a couple of kB of memory. When the analogML core detects voice in the analog domain, the cached preroll can be attached to the front end of the live audio stream. This way, the WWE never misses a keyword, even if it’s spoken just as the system is waking up. How does this process work? See figure 1 for a diagram showing the live audio with preroll attached and watch the video
for more information.
Figure 1: When the analogML core detects voice in the analog domain, the cached preroll and live audio stream are stitched together and delivered to the WWE to maintain keyword recognition accuracy.
The bottom line is, the analog voice activity detection and system wake-up enabled by the analogML core supports preroll management to maintain the accuracy of the WWE while the standard method of analog activity detection does not. So while both methods can reduce always-on system power, only the analogML core maximizes battery-life AND maintains keyword accuracy as shown in figure 2, which makes the difference between using voice to regularly control your device, or becoming so frustrated that you abandon voice-control altogether.
Figure 2: Standard analog activity detection circuit vs voice activity detection with the analogML core
for more information about integrating the analogML core into your design and start saving power today.