Search

ALL
Blog
Research Areas
Publications
News
Others

Blog(2)

T-GEMS: Text-Guided Exit Modules for Decreasing CLIP Image Encoder
In the rapidly evolving landscape of artificial intelligence, multimodal deep neural networks have become a key approach for bridging visual perception and language understanding.
[CVPR 2023 Series #2] StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos
Observing someone perform a task (e.g., cooking, assembling furniture or fixing an electronic device) is a common approach for humans to acquire new skills. Instructional videos provide an excellent resource to learn such procedural activities for both humans and AI agents.

Research Areas(0)

Publications(0)

News(0)

Others(0)