Blog(1)
Zero-shot sketch-based image retrieval (ZS-SBIR) is a central problem to sketch understanding [6]. This paper aims to tackle all problems associated with the current status quo for ZS-SBIR, including category-level (standard) [4], fine-grained [1], and cross-dataset [3].
Research Areas(0)
Publications(21)
Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP
AuthorShell Xu Hu
PublishedConference on Empirical Methods in Natural Language Processing (EMNLP)
Date2024-11-13
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs
AuthorYassine Ouali, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos
PublishedEuropean Conference on Computer Vision (ECCV)
Date2024-09-30
Modularized Multilingual NMT with Fine-grained Interlingua
AuthorSungjun Lim, Yoonjung Choi, Sangha Kim
PublishedNorth American Chapter of the Association for Computational Linguistics (NAACL)
Date2024-06-20
News(4)
Recently, personalized AI systems have gained significant attention. In the TTS field, zero-shot text-to-speech (ZS-TTS) systems [1-7] enable users to create their own TTS systems that replicate their voices with just one utterance, without further training.
Large Language Models (LLMs) have showcased impressive capabilities in text generation, translation, and code synthesis. Recent efforts focus on integrating LLMs, notably ChatGPT, into robotics for tasks like zero-shot system planning [1].
In recent years, text-to-speech (TTS) has accomplished remarkable improvement with the emergence of various end-to end TTS models [1, 2, 3]. Through these advanced models, TTS expands its field from a model built with a professional voice actor to a personalized TTS.
Others(0)