Although high quality implementations are already available for mainstream, PC-like computing systems, deploying such implementations into diverse technological areas (i.e. automotive, transportation, IoT, medical etc.), requires development of Deep Learning architectures on small embedded platforms that operate with limited hardware resources and often within a restricted power budget.