Search

ALL
Blog
Research Areas
Publications
News
Others

Blog(2)

Robust neural codec language modeling with phoneme position prediction for zero-shot TTS
Large language models (LLMs) have exhibited impressive in-context learning abilities [1]. Inspired by these successes, recent studies [2-5] have extended LLM applications to text-to-speech (TTS) systems by representing speech through discrete acoustic codes.
[CVPR 2023 Series #7] Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style
Zero-shot sketch-based image retrieval (ZS-SBIR) is a central problem to sketch understanding [6]. This paper aims to tackle all problems associated with the current status quo for ZS-SBIR, including category-level (standard) [4], fine-grained [1], and cross-dataset [3].

Research Areas(0)

Publications(23)

View More

News(6)

View More

Others(0)