MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World

Diffusion models have revolutionised image generation and editing, producing stunning visuals across various applications. However, despite their impressive capabilities, these models often fall short when accurately replicating intricate physical phenomena, such as shadows, occlusions, and, most notably, photorealistic mirror reflections. Current state-of-the-art text-to-image models like Stable Diffusion 3.5 and FLUX and inpainting methods like HD-Painter struggle to generate geometrically accurate and consistent reflections, particularly in complex or real-world scenes. This limitation primarily stems from a scarcity of high-quality training data featuring mirrors and their reflections.

A team of researchers – Ankit Dhiman, Manan Shah, and R Venkatesh Babu, from the Vision and AI Lab (VAL) at the Indian Institute of Science, has addressed this critical challenge. Their work, presented at CVPR 2025, introduces MirrorVerse. This novel approach significantly enhances diffusion models’ ability to generate faithful mirror reflections. MirrorVerse introduces SynMirror V2, a large-scale synthetic dataset designed with an unprecedented level of diversity. Recognising the limitations of previous synthetic datasets, SynMirror V2 incorporates crucial augmentations into its data generation pipeline through Blenderproc:

  • Random Object Positioning: Objects are placed with varied x–y positions, ensuring they are visible in both the mirror and to the camera, and are ‘grounded’ to prevent floating artefacts.
  • Randomised Rotations: Objects are randomly rotated around the y-axis to introduce variability in their orientation.
  • Multi-Object Scenes with Semantic Pairing: To handle complex spatial relationships and occlusions, the dataset includes scenes with multiple objects, which are semantically paired (e.g., a chair with a table) to ensure realistic interactions.

SynMirror V2, containing over 207,000 synthetic images, vastly surpasses existing datasets in diversity and variability, including attributes like backgrounds, floor textures, camera poses, and mirror orientations.

Beyond the dataset’s contribution, the team implemented a three-stage progressive training curriculum for their MirrorFusion 2.0 model. This curriculum is critical for effectively bridging the synthetic-to-real domain gap and incrementally building the model’s capabilities to handle increasing scene complexity.

While significantly advancing realistic reflection synthesis for plane mirrors, the next challenge for future research lies in robustly tackling highly complex, real-world mirror configurations and further minimising the synthetic-to-real domain gap for enhanced generalisation.

This work was presented at the IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) 2025, Nashville, TN, USA.

Project page: https://mirror-verse.github.io/

Paper: https://openaccess.thecvf.com/content/CVPR2025/papers/Dhiman_MirrorVerse_Pushing_Diffusion_Models_to_Realistically_Reflect_the_World_CVPR_2025_paper.pdf