Dynamic Pruning of CNN networks
A new, radical CNN dynamic pruning approach is presented in this paper, achieved by a new holistic intervention on both the CNN architecture and the training procedure, which targets to the parsimonious inference by learning to exploit and dynamically remove the redundant capacity of a CNN architecture. Our approach formulates a systematic and data-driven method for developing CNNs that are trained to eventually change size and form in real-time during inference, targeting to the smaller possible computational footprint. Results are provided for the optimal implementation on a few modern, high-end mobile computing platforms indicating a significant speed-up.
During the past few years, convolutional neural networks (CNNs) have been established as the dominant technology for approaching real-world visual understanding tasks. A significant research effort has been put into the design of very deep architectures, able to construct high-order representations of visual information. The accuracy obtained by deep architectures on image classification and object detection tasks, proved that the depth of representation is indeed the key to a successful implementation.
Although high quality implementations are already available for mainstream, PC-like computing systems, deploying such implementations into diverse technological areas (i.e. automotive, transportation, IoT, medical etc.), requires development of Deep Learning architectures on small embedded platforms that operate with limited hardware resources and often within a restricted power budget.
Furthermore, meeting particular performance requirements on embedded platforms is, in general, difficult. Building systems based on existing computing libraries (e.g. BLAS, Eigen etc.) generally achieves only limited effectiveness . Based on the above discussion it becomes evident that improving such approaches requires tuning multiple computational kernels for the particular use-case at hand. This generally requires great effort by some very specialized programming teams. Such teams must be simultaneously capable of producing high-efficiency code for a target platform as well as being familiar with the details of CNN computations and algorithmic art, in order to be able to tweak – if necessary – any given architecture.
Download the full-length Paper
Continue reading about our approach on the Pruning Model, the proposed framework and modules, simulation and results on recognition accuracy and computational load, implementation and inference speed measurements.
This paper by N. Fragoulis, I. Theodorakopoulos, E. Vassalos and V. Pothos has been published in: 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA).