Mobile, CPU-only Deep Learning Inference: How fast can it be?

Speed metrics of a super-optimal multi-core, multi-cluster CPU-only implementation on several mobile devices

Deep Learning is a word these days frequently spoken by executives and engineers in numerous technology fields ranging from mobile systems to home appliances and automotive. Deep learning systems, although unprecedented in inference accuracy, they introduce a high computational complexity, questioning their usability in systems featuring a limited computational capacity such as in the case of embedded systems used in many markets today such as smartphones, IoT, Home Appliances etc. A possible workaround to this problem, is the use of heterogeneous computing: This involves the exploitation of every computing resource present on an embedded system (CPU, GPU, DSP) to which a part of the load is off-loaded increasing this way the overall computational capacity and thus processing speed. There are however cases, where a multicore CPU is the only available resource on an embedded system. So, there is a reasonable question in this case: Is it possible to have a decent inference speed?

In Irida Labs, we specialize on solving problems of such a kind. To this end, we took a step forward and we evaluated the performance of a SqueezeNet CNN model on a multi-core multi-cluster CPU.

The SqueezeNet Model.​

The SqueezeNet 1.1 model is able to achieve similar levels of classification accuracy on ImageNet, to the baseline AlexNet architecture, using 50 times fewer coefficients. The smart combination of small convolutional kernels and a complex architecture that enables information to flow through different paths facilitates the construction of sufficiently highorder image representations that are suitable for a large variety of applications. A coefficients’ size of 3 MB, easily reduced further by a factor of 5 via model-compression techniques, makes SqueezeNet a very appealing architecture for embedded implementations.

Download the full-length Paper

Continue reading about the Food Recognition use case, Inference Speed results of SqueezeNet 1.1 model and more.

This paper has been published on March, 2017.