Listen Like a Brain to Extend Battery Life

Share

It was only late 2014 that Amazon launched the category-defining smart speaker, Amazon Echo. Just five years later, we now have hundreds of millions of digital voice assistants installed on smart speakers, smart-home systems, wearables, and other smart devices that always listen for a wake word. With such massive growth ahead of us, it’s more important than ever for designers to address the biggest complaint of consumers who use always-listening products: the battery drain that has them recharging way too frequently.

In our last blog post, we explained why designers need to consider the entire edge-processing signal chain when comparing the power consumption of voice-first components. That is because the signal starts at the microphone and goes all the way through to the wake word engine (WWE), so focusing on the power of each chip in the system is ineffective. Instead, designers need to consider the whole system architecture to reduce power consumption in a significant way.

Take the Relevant Approach

Current voice-activated devices use an architecture that Aspinity calls “digitize-first,” in which all sound data collected by the microphone, relevant or not, is digitized and analyzed by a digital core running a WWE. This is a highly inefficient architecture because it treats all data as if it were equally relevant, but that is not the case. For example, for a voice-activated device, the only relevant audio data is speech since this is the only sound that could possibly contain a wake word. Therefore, the digitization and analysis of all other sounds simply wastes energy.  In some applications where speech is present for only about 10% of the day, a standard digitize-first system is spending 90% of its energy processing irrelevant data (or worse yet –silence!) that will simply be thrown away.

A Bio-Inspired System

A more efficient bio-inspired system architecture offers a far better alternative by using a cascaded approach to processing incoming audio data — spending just a tiny amount of power up front to determine whether the data are important for more complex, higher-power analysis. This architectural approach, which mimics the functionality of the brain, is called “analyze-first,” and it’s made possible by the neuromorphic analog processing technology in Aspinity’s Reconfigurable Analog Modular Processor (RAMP).

A RAMP IC, programmed for a specific inference detection, determines which data are important at the earliest point in the signal chain, while the data is still analog. This allows higher-power processors like the ADC and digital core to remain off or in a low-power mode unless they detect relevant data. In an always-listening, voice activated device, the only important data that will activate the device is speech — which makes it extremely efficient. In fact, an analyze-first architecture can eliminate the wasteful processing of up to 90% of sound data that is simply noise.

A Simple Comparison

This very simple example illustrates the power consumption differences between a digitize-first and an analyze-first architecture. Every voice-first device has three components in common — the microphone, the ADC (sometimes the microphone and ADC are combined into a digital-output microphone), and the microcontroller (MCU) or digital signal processor (DSP) that runs the WWE. In a digitize-first arrangement, all three always-listening components are on 100% of the time. Assuming typical use, the average power consumption in this system is 1.5mA.

Compare that to an analyze-first approach in which the device has two modes: (a) an always-listening mode — in which only the microphone and the RAMP IC are powered — for the 90% of the time when there is no speech, and (b) full power mode when voice has been detected (only 10% of the time). Because of this more intelligent partitioning, the analyze-first system’s average current consumption is only 218µA, only a fraction of the average power consumption of the digitize-first system. For a battery-operated device running on two AA batteries, that’s the difference between 4.5 months and 2.5 years of battery life!

This simple example shows a 7x power advantage in an analyze-first system when using the same components as compared to the standard digitize-first approach. And what if your application requires the recognition of more keywords or phrase? You can use a RAMP IC even with a higher-power digital processor to increase system functionality while staying within your limited battery-power budget, becase an analyze-first approach can increase battery life by 10x or more as compared with a traditional architecture.

So go ahead and design your always-listening device with brain-like efficiency to eliminate the processing of irrelevant data that can waste so much battery life.