Blog(2)
Video Object Segmentation (VOS) is an important problem in computer vision and it has a lot of interesting applications like video editing, surveillance, autonomous driving, and augmented reality.
We study the significance of Feature Learning in Neural Networks (NNs) for Knowledge Distillation (KD), a popular technique to improve an NN model’s generalisation performance using a teacher NN model. We propose a principled framework Feature Kernel Distillation (FKD), which performs distillation directly in the feature space of NNs and is therefore able to transfer knowledge across different datasets.
Research Areas(0)
Publications(23)
EDiT: Efficient Diffusion Transformers with Linear Compressed Attention
AuthorAbhinav Mehrotra, Ruchika Chavhan, Malcolm Chadwick, Luca Morreale, Mehdi Noroozi, Alberto Gil Ramos
PublishedInternational Conference on Computer Vision/ European Conference on Computer Vision (ICCV)
Date2025-10-21
Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP
AuthorShell Xu Hu
PublishedConference on Empirical Methods in Natural Language Processing (EMNLP)
Date2024-11-13
You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation
AuthorMehdi Noroozi, Isma Hadji, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos
PublishedEuropean Conference on Computer Vision (ECCV)
Date2024-09-30
News(4)
Scene Text Recognition (STR) aims to automatically transcribe text in natural scenes, enabling applications in autonomous driving, augmented reality, language translation, and assistive technologies.
Initializing parameters by a pretrained masked language model (LM) [1] is a knowledge transfer method widely applied to natural language processing tasks. Following its success, pretrained neural machine translation (NMT) models have attracted more and more research interest [2,3,4,5].
Video Object Segmentation (VOS) is an important problem in computer vision and it has a lot of interesting applications like video editing, surveillance, autonomous driving, and augmented reality. Basically, VOS is when you try to find and follow objects across multiple frames in a video.
Others(0)