Parsimonious Inference: Towards rapideye deep-learning inference
A new, radical CNN design approach is introduced, considering the reduction of the total computational load during inference. This is achieved by a new holistic intervention on both the CNN architecture and the training procedure, which targets to the parsimonious inference by learning to exploit or remove the redundant capacity of a CNN architecture. This is accomplished, by the introduction of a new structural element that can be inserted as an addon to any contemporary CNN architecture, whilst preserving or even improving its recognition accuracy.
Our approach formulates a systematic and data-driven method for developing CNNs that are trained to eventually change size and form in real-time during inference, targeting to the smaller possible computational footprint. Results are provided for the optimal implementation on a few modern, high-end mobile computing platforms indicating a significant speed-up of up to x3 times.
Top-performing deep-learning systems usually involve deep and wide architectures and therefore come with the cost of increased storage and computational requirements, while the trend is to continuously increase the depth of the networks. Besides this drawback deep neural networks feature a very intriguing property, which provides opportunities for optimizations regarding both storage space and computations-the redundancy of parameters and computational units (e.g. convolutional kernels). The presence of this redundancy also raises the question of whether a network must be deep or not. But indeed, it is shown that deeper models outperform shallower ones.
At the same time, there is an increasing need to use deep CNNs in applications running on embedded devices. These devices-especially in the Internet of Things (IoT) era- are often equipped with small storage and low computational capabilities. As such they cannot neither store large CNN models nor to cope with the associated computational complexity of a deep and wide CNN. Therefore, it becomes apparent that, in order to make CNNs appealing for mobile devices, both the parameter size and the number of kernels need to be reduced. But then, a fundamental dilemma becomes apparent.
Can we design computationally efficient CNNs and at the same time to maintain recognition accuracy and generalization, while keep the overall structure efficient for operating on mobile embedded devices?
To answer the above-mentioned dilemma, in Irida Labs, a systematic way has been developed for implementing CNN variants that are parsimonious in computations. To this end, the proposed approach allows us to contract and train a CNN to:
- Use as few computing resources as possible. The devised procedure results into an optimal pruning of a CNN architecture, guided by the complexity of the task and the nature of the input data.
- Change size and form on-the-fly during inference, depending on the input data. This property enables to perform inference using less effort for “easier” instances of data than others.
- Optimize for the above objectives via regular back-propagation, simultaneously to the primary task objective of the model. This way we avoid the prune-fine-tune iterative procedure, which is usually followed in order to reduce the size of a model.
The proposed method is compatible with any contemporary deep CNN architecture and can be used in combination with other model thinning approaches (optimal filtering, factorization, etc.) resulting into a further processing optimization.
The idea is presented in the following Figure. In this figure, a part of a typical convolutional network is shown, depicting only the i-th and the (i+1)-th convolutional layer. A Learning Kernel-Activation Module (LKAM) is introduced, linking two consecutive convolutional layers. This learning switch module is capable to switch on and off individual kernels of any layer depending on its input, which is the output of the previous convolutional layer.
The module learns which kernel to disable during the CNN training process, which is for that reason specifically devised to facilitate such operation by exploiting data sparsity usually employed in images.
Download the full-length Paper
Continue reading about the full details of the approach, the Implementation and the Simulation we conducted as well as the Evaluation for the results.
This paper by I. Theodorakopoulos, V. Pothos, D. Kastaniotis and N. Fragoulis has been published on January, 2017.