Learning from Streaming Video Data

Speaker: Sudeep Sarkar, Distinguished University Professor and Launch Dean, Bellini College of AI, Cybersecurity, and Computing, University of South Florida, Tampa

Date: 25 February 2026

YouTube link: https://youtu.be/0u4PYXsAud4

The talk on ‘Learning from Streaming Video Data’ by Sudeep Sarkar was organised jointly with the Department of Electrical Engineering (EE) at IISc. The attendees were from different departments in IISc, including retired faculty. A summary of the talk is provided below.

Today’s dominant AI paradigm, massively supervised deep learning, treats perception as an offline classification problem trained on static, exhaustively larger and larger datasets. In contrast, the human visual system learns online, parsing a never-ending sensory stream into meaningful events, storing only what matters, and continually refining its internal model of the world. This lecture presented an alternate framework for AI that embraces these principles: self-supervised predictive learning on streaming video. Drawing on Event Segmentation Theory, cortical models of sequence memory, and their systems, STREAMER for hierarchical prediction and Predictive Attractor Models for future-state generation, Sudeep Sarkar showed how a minimalist ‘predict-then-surprise’ mechanism can (i) slice long, untrimmed videos into temporally coherent sub-events and (ii) spatially localise the actors driving those events, without a single manual label. Across canonical benchmarks (Breakfast Actions, 50 Salads, INRIA Instructional Videos) and a ten-day, 23-million-frame wildlife dataset, these models outperform prior unsupervised and weakly supervised methods by up to 24% and approach fully supervised accuracy, while learning in a single pass and storing no frames. By reframing perception as continuous prediction and rapid adaptation, we edge closer to the brain’s remarkable efficiency and open a path toward truly lifelong, label-free video understanding.