Blog(1)
Since the scale of the state-of-the-art AI models has become deeper, model compression also has been attracting more attention as a method to let models be deployed on edge devices without accessing cloud servers.
Research Areas(0)
Publications(22)
QBB: Quantization with Binary Bases for LLMs
AuthorAdrian Bulat, Yassine Ouali, Georgios Tzimiropoulos
PublishedNeural Information Processing Systems (NeurIPS)
Date2024-12-11
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers
AuthorJunhan Kim, Chungman Lee, Eulrang Cho, Kyungphil Park, Ho-young Kim, Joonyoung Kim, Yongkweon Jeon
MobileQuant: Mobile-friendly Quantization for On-device Language Models
AuthorShell Xu Hu, Sourav Bhattacharya, Timothy Hospedales, Georgios Tzimiropoulos, Brais Martinez
PublishedConference on Empirical Methods in Natural Language Processing (EMNLP)
Date2024-11-13
News(3)
Research in Automatic Speech Recognition (ASR) continues to show that larger models yield better results. But while state-of-the-art networks continue to grow with billions of parameters, the difficulty of deploying these models on device also increases.
At the British Machine Vision Conference (BMVC) 2021, we disseminated work on mobile inverse tone mapping. In this work, we tackled converting high resolution images to high dynamic range (HDR) images in real-time with a mobile-focused model which utilized low-bit quantization of its parameters in order to accelerate inference.
Others(0)