Three Things Everyone should know about View Synthesis

Aayush Bansal, Principal Scientist and Head of Research, SpreeAI

– 18 April 2024

Talk summary: In this talk, Aayush Bansal spoke about three experiments from his work on view synthesis. In the first part of the talk, he demonstrated how we can leverage simple two-dimensional (2D) pixels at no additional cost to get detailed 3D view synthesis.  In this work, he used simple low-level image statistics to efficiently mine hard examples for better learning. Simply biasing ray sampling towards hard ray examples enables learning of neural fields with more accurate high-frequency detail in less time. 

In the second part of the talk, he moved away from neural radiance fields and used simple insights from image-based rendering methods to obtain detailed 3D–4D view synthesis from multi-view inputs. He demonstrated how we can build a representation of a pixel that contains colour and depth information accumulated from multi-views for a particular location and time along a line of sight. This pixel-based representation alongside a multi-layer perceptron allows us to synthesize novel views given a discrete set of multi-view observations as input. The proposed formulation reliably operates on sparse and wide-baseline multi-view images/videos and can be trained efficiently within a few seconds to 10 minutes for hi-res (12MP) content.

Finally, he presented a simple yet highly effective approach to modelling high-fidelity volumetric avatars from sparse views. He demonstrated the importance of encoding relative spatial 3D information via sparse 3D keypoints. These insights allow us to outperform state-of-the-art methods for head reconstruction. On human body reconstruction for unseen subjects, they also achieve performance comparable to prior art that uses a parametric human body model and temporal feature aggregation.

Speaker bio Aayush Bansal is the Principal Scientist and Head of Research at SpreeAI (https://www.spreeai.com/), an upcoming startup on virtual tryon. He earned a PhD in Robotics from Carnegie Mellon University under the supervision of Prof Deva Ramanan and Prof Yaser Sheikh. He was a Presidential Fellow at CMU and a recipient of the Uber Presidential Fellowship (2016-17), Qualcomm Fellowship (2017-18), and Snap Fellowship (2019-20). His research has been covered by various national and international media such as NBC, CBS, WQED, 90.5 WESA FM, France TV, and Journalist. He has previously worked with Reality Labs Research at Meta Platforms Inc. and Adobe Research. He has also worked with production houses such as  BBC Studios, Full Frontal with Samantha Bee (TBS), etc. He serves on the senior program committee of prestigious academic conferences and journals such as IEEE/CVF CVPR, ICCV, NeurIPS, SIGGRAPH, etc. His work has been awarded with multiple awards and citations.

[Talk organised in collaboration with the Department of Computational and Data Sciences]