Blog(1)
Large vision-language models (VLMs) have achieved impressive performance across a wide range of multimodal tasks, from visual question answering (VQA) to reasoning over images and text [1, 2]. However, these models often suffer from hallucinations and poor grounding when faced with knowledge-intensive queries.
Research Areas(0)
Publications(0)
News(0)
Others(0)