Blog(2)
Large language models (LLMs) have exhibited impressive in-context learning abilities [1]. Inspired by these successes, recent studies [2-5] have extended LLM applications to text-to-speech (TTS) systems by representing speech through discrete acoustic codes.
Zero-shot sketch-based image retrieval (ZS-SBIR) is a central problem to sketch understanding [6]. This paper aims to tackle all problems associated with the current status quo for ZS-SBIR, including category-level (standard) [4], fine-grained [1], and cross-dataset [3].
Research Areas(0)
Publications(23)
PhysID: Physics-based Interactive Dynamics from a Single-view Image
AuthorSourabh Vasant Gothe, Ayon Chattopadhyay, Gunturi Venkata Sai Phani Kiran, Pratik, Vibhav Agarwal, Jayesh Rajkumar Vachhani, Sourav Ghosh, Parameswaranath VM, Barath Raj KR
PublishedInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Date2025-03-07
MoDeGPT: Modular Decomposition for Large Language Model Compression
AuthorChi-Heng Lin, Retiree, Retiree, Retiree, Retiree, Yilin Shen, Hongxia Jin
PublishedInternational Conference on Learning Representation (ICLR)
Date2025-01-02
Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP
AuthorShell Xu Hu
PublishedConference on Empirical Methods in Natural Language Processing (EMNLP)
Date2024-11-13
News(6)
Voice cloning, especially zero-shot speech synthesis, has become one of the most exciting frontiers in speech technology.
Recently, personalized AI systems have gained significant attention. In the TTS field, zero-shot text-to-speech (ZS-TTS) systems [1-7] enable users to create their own TTS systems that replicate their voices with just one utterance, without further training.
Others(0)