Object-centric 3D Scene Understanding from Videos

Yash Bhalgat, PhD student, Visual Geometry Group, University of Oxford

– 11 January 2024

Talk summary: The growing demand for immersive, interactive experiences has underscored the importance of three-dimensional (3D) data in understanding our surroundings. Traditional methods for capturing 3D data are often complex and equipment-intensive. In contrast, Yash Bhalgat’s research aims to utilise unconstrained videos, such as those from augmented reality glasses, to effortlessly capture scenes and objects in their full 3D complexity. As a first step, he described a method to incorporate epipolar geometry priors in multi-view transformer models to enable identifying objects across extreme pose variations. Next, he discussed his recent work on 3D object segmentation using two-dimensional pre-trained foundation models. Finally, he touched upon his ongoing work on object-centric dynamic scene representations.

Speaker bio: Yash Bhalgat is a third year PhD student at University of Oxford’s Visual Geometry Group, supervised by Andrew Zisserman, Andrea Vedaldi, Joao Henriques and Iro Laina. His research is broadly in 3D computer vision and machine learning, with specific focus on geometry-aware deep networks (transformers), 3D reconstruction, and neural rendering. He also works on the intersection of 3D and LLMs. Previously, he was a Senior Researcher at Qualcomm AI Research working on efficient deep learning. He received his Master’s in Computer Science from the University of Michigan – Ann Arbor, and his Bachelor’s in Electrical Engineering (with CS minor) from IIT Bombay.

[Talk organised in collaboration with the Department of Computational and Data Sciences]