Blog(1)
Since the scale of the state-of-the-art AI models has become deeper, model compression also has been attracting more attention as a method to let models be deployed on edge devices without accessing cloud servers.
Research Areas(0)
Publications(19)
Z-FOLD: A Frustratingly Easy Post-Training Quantization Scheme for LLMs
AuthorYongkweon Jeon, Chungman Lee, Kyungphil Park, Ho-young Kim
PublishedConference on Empirical Methods in Natural Language Processing (EMNLP)
Date2023-12-06
Genie: Show Me the Data for Quantization
AuthorYongkweon Jeon, Chungman Lee, Ho-young Kim
PublishedComputer Vision and Pattern Recognition(CVPR)
Date2023-06-18
NAWQ-SR: A Hybrid-Precision NPU Engine for Efficient On-Device Super-resolution
AuthorStylianos I. Venieris, Royson Lee
PublishedIEEE Transactions on Mobile Computing
Date2023-03-13
News(3)
Research in Automatic Speech Recognition (ASR) continues to show that larger models yield better results. But while state-of-the-art networks continue to grow with billions of parameters, the difficulty of deploying these models on device also increases.
At the British Machine Vision Conference (BMVC) 2021, we disseminated work on mobile inverse tone mapping. In this work, we tackled converting high resolution images to high dynamic range (HDR) images in real-time with a mobile-focused model which utilized low-bit quantization of its parameters in order to accelerate inference.
Others(0)