The Three P’s of Modern Computer Vision: Pixels, Perception, and Physics

Speaker: Anand Bhattad, Assistant Professor of Computer Science, Johns Hopkins University

Date: 05 January 2026

YouTube link: https://youtu.be/xMC72d2SySQ

The talk on ‘The Three P’s of Modern Computer Vision: Pixels, Perception, and Physics’ by Anand Bhattad was organised jointly with the Department of Computational and Data Sciences (CDS) at IISc. The attendees were from different departments in IISc, including the undergraduate BTech programme. A summary of the talk is provided below.

For decades, computer vision has been guided by what Jitendra Malik and colleagues called the ‘Three R’s’: recognition (what is it?), reconstruction (what is its 3D shape?), and reorganisation (what belongs together?). This framework drove extraordinary progress. Today, recognition and reorganisation are largely solved. There has been great progress in recent years in reconstruction; it is largely solved for objects and partially solved for scenes.

However, the rise of generative models opens a new frontier: moving from static description to dynamic understanding. This talk presented a new paradigm the speaker calls the ‘Three P’s’: pixels, perception, and physics. Here, the vision is to perceive, understand, interact, and recreate the visual world without labelled data. Anand Bhattad’s group does three types of work, based on the common theme of generative models, namely (i) GLOW (Generative Learning of Worlds), (ii) LUMA (Learning, Understanding, and Modelling of Appearances), and (iii) PULSE (Physical Understanding and Learning of Scene Events).

The speaker’s research goal is to teach generative models physical laws. For this, they probe what these models know about the physical world, then try to analyse and identify the limitations of these models and finally use this knowledge for improvement. This is a systematic cycle to continuously build better models by deeply understanding their strengths and addressing core limitations. The talk focused on what generative models know, what they should know, what they can do, and the path forward. The speaker pointed out that generative models seem to understand intrinsic physics and some sort of intuitive physics, and they have just started with actual physics. The way forward would be moving from intrinsic physics to intuitive physics to actual physics, and this will enable some interesting applications in robotics and beyond. It may also help us to discover hidden visual representations. The time may be right to take on the ‘crazier’ ideas of teaching models intuitive and actual physics. The 3P vision framework allows us to do these things.