The Differentiable Camera
Although today's cameras fuel diverse applications, from personal photography to self-driving vehicles, they are designed in a compartmentalized fashion where the optics, sensor, image processing pipeline, and vision models are often devised in isolation: a camera design is decided by intermediate metrics describing optical performance, signal to noise ratio, and image quality, even though object detection scores may only matter for the camera application. In this talk, I will present a differentiable camera architecture, including compound optics, sensing and exposure control, image processing, and downstream vision models. This architecture allows us to learn cameras akin to neural networks, entirely guided by downstream loss functions. Learned cameras move computation to the optics, with entirely different optical stacks for different vision tasks (and beating existing stacks such as Tesla's Autopilot). The approach allows us to learn entirely new cameras that are ultra-small at a few hundred microns in size while matching the quality achieved with cm-size compound lenses, and learn active illumination together with the image pipeline, achieving accurate dense depth and vision tasks in heavy fog, snow, and rain (beating scanning lidar methods). Finally, I will describe an approach that makes the scene itself differentiable, allowing us to backpropagate gradients through the entire capture and processing chain in an inverse rendering fashion. As such, the proposed novel breed of learned cameras brings unprecedented capabilities in optical design, imaging, and vision.
Felix Heide is a professor at Princeton, where he leads the Princeton Computational Imaging Lab since 2020. He co-founded Algolux, where he lead research and development of the full autonomous driving stack. His group at Princeton explores imaging and computer vision approaches that allow computers to see and understand what seems invisible today — enabling super-human capabilities for the cameras in our vehicles, personal devices, microscopes, telescopes, and the instrumentation we use for fundamental physics. This includes today's capture and vision challenges, including harsh environmental conditions, e.g. imaging under ultra-low or high illumination or computer vision through dense fog, rain, and snow, imaging at ultra-fast or slow time scales, freezing light in motion, imaging at extreme scene scales, from super-resolution microscopy to kilometer-scale depth sensing, and imaging via proxies using nearby object surfaces as sensors instead. Researching vision systems end-to-end, his work lies at the intersection of optics, machine learning, optimization, computer graphics, and computer vision. He received my Ph.D. from the University of British Columbia, and he was a postdoc at Stanford University. His doctoral dissertation won the Alain Fournier Ph.D. Dissertation Award and the SIGGRAPH outstanding doctoral dissertation award.