Analog Memory for Efficient AI Compute

November 27, 2023

In the past few years, there has been remarkable growth in the use of AI and an explosion in the amount of data collected, processed, and transmitted that accompanies it. And with the recent introduction of generative AI, the quantity of data will grow even more rapidly until one day, in the not-so-distant future, we don’t have enough energy available to process it all. This sounds like a doomsday scenario, but it is in fact reality UNLESS we can quickly find more-efficient AI processing solutions.

This is where analog enters the story. Analog can help us build much more efficient neural networks and AI systems for several reasons. First, analog circuits can be much more efficient than their digital counterparts for operations that don’t require an extremely high number of bits of resolution, such as AI. Additionally, the main workload function in neural networks, the multiply-accumulate (MAC), can be implemented in analog using fewer than 1/100th of the number of transistors than a digital implementation. And finally, analog memory enables MAC weights to be stored in or near the analog compute, eliminating power-expensive fetches of data from memory.

Overall, analog is a less complex, lower power, and generally more efficient solution for AI computing, IF you can overcome the hurdles that have generally prevented the widespread adoption of analog computing, mainly:

High-precision analog memory has not been available.
Analog circuits have been susceptible to manufacturing and environment variations.
Analog chips have been mostly fixed function and difficult to use

In this article, we’re going to talk about one of Aspinity’s key innovations – the development of high-precision analog memory, from which all of the rest follow.

Analog computing systems process analog values. Therefore, its memory should also be storing analog values. Whereas digital memory stores quantized values, analog memory stores values on a continuum. The precision of that analog memory matters because one single memory element needs to be able to store any value in that continuum within that range. While there is a lot of work being done at the algorithmic level in neural networks to be able to quantize down to lower resolution processing (8 bits or less), more bits allows for a wider range of values to be used. Have you ever heard an engineer say, “I wish I had less bits to work with?” No, I haven’t heard that either, but the resolution in an AI system is typically sacrificed to meet stringent power and bandwidth constraints.

Additionally, since analog computing does not work like a typical von Neumann architecture where you need to continuously read and write from a common memory bank, analog memory can – and should – be co-located with the compute elements. This has a major advantage when considering the memory bottleneck – analog doesn’t have to expend the energy and latency to fetch from memory blocks. Finally, analog memory should be as easy to program as digital memory for ubiquitous adoption.

At Aspinity, we have developed a proprietary, analog non-volatile memory (NVM) with >10 bit precision. This custom floating-gate based memory that has been architected specifically for storing analog values. and is implemented in standard CMOS with no add-ons, so it can be readily incorporated into various designs and technology nodes. Our analog NVM can be used for more than just storing neural network weights; it can also be embedded within the various computational circuits such as storing biases and activations. And because of its high precision, the analog NVM is also able to store the values needed to finely trim out variations in analog circuit performance that arise from environmental conditions or the standard CMOS manufacturing process.

In the next article, we will talk about how we uniquely implement our analog NVM in order to deliver an ultra-low-power, high-performance, flexible, and scalable analog AI processing platform: AnalogML™.