Voice First

Improve battery life by up to 10x for portable, voice-enabled devices.


In the highly contested market for portable, battery-operated, voice-activated digital assitants,  Aspinity's RAMP technology offers a revolutionary power reduction over traditional always-listening edge solutions. Enabling the industry’s only “analyze-first” architecture,  the RAMP processor can detect voice from raw, unstructured analog microphone data, allowing a more efficient partitioning of power and data resources between the edge and the cloud and resulting in smaller form-factor devices with longer battery lifetimes.

From smart speakers to wearables/hearables, remote controls and other voice-enabled smart home products, more than one billion battery-operated, wireless, voice-first devices will come to market by 2021. The problem is that today, these devices use an inefficient digitize-first architecture that digitizes and analyzes all incoming sound data to listen for the wake word, even if the sound only occasionally includes speech. Since it’s the analog-to-digital converter (ADC) and other digital processors that typically dominate the power consumption of these always listening systems, this digitize-first voice processing methodology wastes significant power by processing data that does not contain speech.  

Analyze-First: Use 10X Less Power for Voice Wake-up

Aspinity’s “analyze-first” edge architecture digitizes and analyzes only actual voice data to detect the wake word. That’s a major difference from existing solutions, which sift through all sound, including the long periods of time when there is noise but no speech, as they analyze sound to listen for the wake word.

As an ultra-low-power analog signal processor, the RAMP chip identifies voice at the earliest point in the audio signal chain, directly from the raw analog microphone data. Just as our brain selects only the important sounds to send deeper into the brain for more processing, the RAMP processor keeps the ADC, DSP, and other heavy power-consumers further down the audio chain either off or in a low-power mode — unless voice has actually been detected. Once speech has been detected, RAMP triggers the rest of the voice processing system to wake-up and listen for the wake word, keeping most of the system off for up to 90% of the time. This allows designers for the first to be able to meet their power requirements in voice-enabled devices without sacrificing features and funtionality.

Using an analyze-first architecture allows designers to achieve a better balance between edge and cloud voice processing which has many advantages.  Since only data containing voice is ever transmitted to the cloud, the amount of irrelevent data being sent to the cloud is greatly reduced. Additionally, this eliminates the inadvertent sharing of other personal non-voice information, increasing user privacy. Finally, some battery-operated voice first devices that require only a small number of commands will be able to operate without an internet connection at all - the ultimate solution for secure voice first applications.

RAMP Voice First Processing Satisfies Preroll Requirements

RAMP Voice Activity Detection Block Diagram
Many wake word engines (WWEs) require 500ms preroll (recent audio data) for increased accuracy of wake word verification. Unlike other microphone edge solutions for wake-on-sound or wake-on-voice, the RAMP chip uses a patented Aspinity processing approach to compress 500ms of preroll data. This preroll can then be reconstructed and used for wake word verification once voice has been detected. Alternatively, the wake word engine can be trained to use the compressed preroll data directly.



Learn more

Contact us for more information and to discuss how RAMP can improve the power and data efficiency in your device