The Promise of 3D Representation for Controllable Content Generation

Amit Raj, Research Scientist, Google Research

– 18 October 2023

Talk summary: Neural image synthesis has seen enormous advances in recent years, led by innovations in generative adversarial networks (GANs) that generate high-resolution, photo-realistic images. However, a major limitation of these methods is that they tend to capture texture statistics of an image with no explicit understanding of geometry. Additionally, GAN-only pipelines are notoriously hard to train. In contrast, recent trends in neural and volumetric rendering have demonstrated compelling results by incorporating three-dimensional (3D) information into the synthesis pipeline using classical rendering techniques. Additionally, diffusion models represent another class of generative models that have recently seen great success in high-quality image generation.

Amit Raj leverages ideas from both classical graphics rendering and neural image synthesis to design 3D guided image generation pipelines that are photo-realistic, controllable, and easy to train. 

Speaker bio: Amit Raj is currently a Research Scientist at Google Research working on neural rendering and generative models. He graduated with a PhD in Machine Learning working with Professor James Hays in the College of Computing at Georgia Tech. His research interests lie in the intersection of computer vision, graphics, and machine learning. He is particularly interested in exploring generative models for images conditioned on sparse user guidance to allow two-dimensional (2D) or 3D content authoring. He is also interested in exploring the use of 3D information for more controllable and interpretable neural image synthesis.

[Talk organised in collaboration with the Department of Computational and Data Sciences]