CSIS logoCenter for Secure Information Systems

Securing the World's Cyber Infrastructure

Aerial View of the George Mason Fairfax Campus

CSIS Seminar

Unsupervised Learning of Depth Perception and Beyond

Speaker:   Alex Wong, Yale University
When:   February 9, 2024, 10:00 am - 11:00 am
Where:   ENGR 4801

Abstract

Deep neural networks have seen empirical successes across computer vision tasks, but training them requires tens of thousands to millions of examples, which typically come in the form of an image or images, and human annotated ground truth. Curating vision datasets, in general, amounts to numerous man-hours; tasks like depth estimation require an even more massive effort. I will introduce an alternative form of supervision that leverages multi-sensor validation as an unsupervised (or self-supervised) training objective for depth estimation. I will demonstrate how one can leverage synthetic data and the abundance of publicly available pretrained models, which has largely relied on expensive manual labeling, to learn or distill the regularities of our visual world. In doing so, I show that one can design smaller and faster models that can operate in real-time with state-of-the-art performance. Not only that, these models can be adapted online to novel environments in which they are deployed. Additionally, I will discuss the current limitations of data augmentation procedures used during unsupervised training, which involves reconstructing the inputs as the supervision signal, and detail a method that allows one to scale up and introduce previously inviable augmentations to boost performance. Finally, I will show that unsupervised depth training can serve as a feasible form of large-scale pretraining to produce backbones suitable for semantic tasks.

Speaker Bio

Alex Wong is an Assistant Professor in the department of Computer Science and the director of the Vision Laboratory at Yale University. He received his Ph.D. in Computer Science from the University of California, Los Angeles (UCLA) in 2019 and was co-advised by Stefano Soatto and Alan Yuille. He was previously a post-doctoral research scholar at UCLA under the guidance of Stefano Soatto. His research lies in the intersection of machine learning, computer vision, and robotics and largely focuses on multi-sensor fusion for 3D reconstruction, robust vision under adverse conditions, unsupervised learning, and medical image analysis. His work has received the outstanding student paper award at the Conference on Neural Information Processing Systems (NeurIPS) 2011 and the best paper award in robot vision at the International Conference on Robotics and Automation (ICRA) 2019.