Why Memory Matters in Machine Learning for IoT

By Brad Spiers - 2018-03-27

Everywhere we turn, machine learning is in the news.  The predictive capabilities of the models that we can create is impressive--sometimes exceeding what even humans can do.  Using these models, we can turn streams of bits into actionable decisions.  The source of this data is sensors, deployed on what is called, “The Edge.”  We see sensors appearing in everything from keys to cars.

This power of prediction can be employed at The Edge with the help of memory, specifically large memory bandwidth.

Sensors pose an interesting opportunity.  Being at The Edge, they can act on data immediately, while its insights are still valuable.  However, The Edge is filled with challenges, starting with harsh limits on power.

To take advantage of this opportunity, we need an architecture that is both low-power and fast, to keep up with massive, incoming data streams.  At the same time, we want it to be flexible to adapt as algorithms change over time.  FPGAs can provide a fast, adaptable and low-power solution.

Until now, though, programming FPGAs has been a massive undertaking.  Normal timelines stretch into months.  This long timeline held back FPGA use for machine learning because models and networks can change frequently. Micron is working with the ecosystem to change that.

Micron is engaged with machine learning experts, like FWDNXT, to enable seamless transfer of machine learning models onto FPGAs.  Models are first created in the normal way, using the same software that data scientists use every day—Caffe, PyTorch or Tensorflow.  The models output by these frameworks are then compiled onto FPGAs by FWDNXT’s Snowflake compiler.

Data scientists can use different types of networks—everything from Convolutional Neural Networks (CNNs) for image analysis to Recurrent Neural Networks (RNNs) for time-based patterns.

FWDNXT achieves a high level of efficiency—even comparing to ASICs—by focusing on fetching long “traces” of data.  You can read more details about traces and FWDNXT’s architecture here:

The key idea is to fetch long traces of data instead of just small matrices from within a network layer.  Data traces can be much longer because they stretch across layers.  This approach increases the data being available for computation by up to a factor of one hundred.  FWDNXT hides non-essential computation components behind these large blocks of work, resulting in improved efficiency on well-known benchmarks such as AlexNet and GoogLeNet.

With such large data blocks available, the dominant architecture system feature becomes memory bandwidth.  Put another way, moving the data into the FPGA becomes the limiting factor.  Once again, we are looking for a power-efficient, yet high-bandwidth device.  Micron’s Hybrid Memory Cube (HMC) is just the ticket:

The Hybrid Memory Cube has 8.5 times the bandwidth of DDR4, yet uses 30% percent of the energy per bit.

We’ve already done the work and combined FPGAs and Hybrid Memory Cubes on our AC-510 UltraScale-based SuperProcessor.

As you can see, these modules start at just 24 Watts of power.  Thus, they fit into harsh power limits at The Edge.  For environments that have larger power envelopes, you can add modules to increase the number of models that analyze your data streams.

In short, Micron has got you handled when you’re ready to take on machine learning—at The Edge!

Brad Spiers