Blog(2)
In the rapidly evolving landscape of artificial intelligence, multimodal deep neural networks have become a key approach for bridging visual perception and language understanding.
Observing someone perform a task (e.g., cooking, assembling furniture or fixing an electronic device) is a common approach for humans to acquire new skills. Instructional videos provide an excellent resource to learn such procedural activities for both humans and AI agents.
Research Areas(0)
Publications(0)
News(0)
Others(0)