Upcoming Events

CDS–KIAC TALK: Towards Measuring and Mitigating Hallucinations in Generative Image Super-Resolution

Title: Towards Measuring and Mitigating Hallucinations in Generative Image Super-Resolution
Speaker: Raghav Goyal, Senior Researcher, Samsung Research America (SRA), Toronto
Date and time: 13 August 2025, 04:00 PM
Venue: #102, Seminar Hall, Department of Computational and Data Sciences, Indian Institute of Science, Bengaluru 560012

Talk abstract: Generative super-resolution (GSR) currently sets the state-of-the-art in terms of perceptual image quality, overcoming the ‘regression-to-the-mean’ blur of prior non-generative models. However, from a human perspective, such models do not fully conform to the optimal balance between quality and fidelity. Instead, a different class of artifacts, in which generated details fail to perceptually match the low-resolution image (LRI) or ground-truth image (GTI), is a critical but under-studied issue in GSR, limiting its practical deployments. In this talk, I will focus on measuring, analysing, and mitigating these artifacts (i.e., ‘hallucinations’). First, we analyse hallucinations by observing that they are not well-characterised with existing image metrics or quality models, as they are orthogonal to both exact fidelity and no-reference quality. Second, to measure hallucinations, we propose to take advantage of a multimodal large language model (MLLM) that assesses hallucinatory visual elements and generates a ‘hallucination score’ (HS) which is closely aligned with human evaluations. Third, to mitigate hallucinations, we find that certain deep feature distances have strong correlations with HS, and therefore we propose to align the GSR models by using such features as differentiable reward functions to mitigate hallucinations.

Speaker biography: Raghav Goyal is a researcher at the Samsung AI Center Toronto. He obtained his PhD at The University of British Columbia (UBC), supervised by Professor Leonid Sigal, with a focus on data-efficient learning for structured vision tasks. Prior to this, he spent three years at a startup named ‘20bn’ (now at Qualcomm Research) working on video understanding including the Something–Something dataset. He has published in top venues such as CVPR, ICCV, and NeurIPS, with internships at Google, Meta, and Xerox Research. He obtained an integrated MTech (5-year programme) from the Indian Institute of Technology (IIT) Delhi in Mathematics and Computing.

Host faculty: Professor Venkatesh Babu Radhakrishnan, Department of Computational and Data Sciences, Indian Institute of Science