Mishra A, Löffler C, Plinge A (2020)
Publication Language: English
Publication Type: Conference contribution, Conference Contribution
Publication year: 2020
Pages Range: 1-5
Event location: Virtual (from San Jose, California)
URI: https://www.emc2-ai.org/assets/docs/virtual-20/emc2-virtual20-paper-8.pdf
Open Access Link: https://www.emc2-ai.org/assets/docs/virtual-20/emc2-virtual20-paper-8.pdf
Given the presence of deep neural networks (DNNs) in all kinds of applications, the question of optimized deployment is becoming increasingly important. One important step is the automated size reduction of the model footprint. Of all the methods emerging, post-training quantization is one of the simplest to apply. Without needing long processing or access to the training set, a straight forward reduction of the memory footprint by an order of magnitude can be achieved. A difficult question is which quantization methodology to use and how to optimize different parts of the model with respect to different bit width. We present an in-depth analysis on different types of networks for audio, computer vision, medical and hand-held manufacturing tools use cases; Each is compressed with fixed and adaptive quantization and fixed and variable bit width for the individual tensors.
APA:
Mishra, A., Löffler, C., & Plinge, A. (2020). Recipes for Post-training Quantization of Deep Neural Networks. In Proceedings of the 6th Workshop on Energy Efficient Machine Learning and Cognitive Computing (EMC^2) (pp. 1-5). Virtual (from San Jose, California), US.
MLA:
Mishra, Ashutosh, Christoffer Löffler, and Axel Plinge. "Recipes for Post-training Quantization of Deep Neural Networks." Proceedings of the 6th Workshop on Energy Efficient Machine Learning and Cognitive Computing (EMC^2), Virtual (from San Jose, California) 2020. 1-5.
BibTeX: Download