Opening the Black Box without Gradients: A Learnable Approach to Explainable AI

Modern AI systems are increasingly powerful, but often difficult to interpret. This becomes especially challenging when the model is available only through input–output queries, with no access to its gradients, architecture, or internal parameters. Such ‘black box’ settings are common in large, deployed AI systems, yet explanations remain crucial in high-stakes domains such as healthcare, finance, autonomous systems, and audio–visual intelligence.

To address this, researchers at the LEAP Lab, Indian Institute of Science developed DAX — Distillation Aided Explainability, a gradient-free, post-hoc, model-agnostic framework for explaining deep neural networks.

What problem does DAX solve?

Many existing explanation methods either require access to model gradients or simplify the input heavily before generating explanations. For example, image-based methods may divide an image into coarse segments and estimate which segment mattered most. While useful, this can lead to explanations that are imprecise, unreliable, or depend on the architecture or gradient access.

DAX instead asks a more direct question:

Which parts of the input are truly needed to preserve the black box model’s decision?

Core idea

DAX treats explanation as a learnable problem. Given a black box model and an input image or audio spectrogram, DAX learns a saliency mask that highlights the regions most responsible for the model’s prediction.

It uses two jointly trained components:

  • Mask generation network
    Learns where to look by producing an explanation mask over the input.
  • Student distillation network
    Learns to locally mimic the black box model using only query-based outputs.

The final explanation is obtained by multiplying the learned mask with the original input:

Explanation = Input × Learned Saliency Mask

This allows DAX to generate explanations without requiring gradient access to the black box model.

Key findings

  • DAX produced more precise saliency maps across image and audio tasks.
  • It achieved strong performance on object classification, sound event classification, and cough-based biomedical audio analysis.
  • Human evaluation showed that DAX explanations were more interpretable and useful than several competing approaches.

Paper details

Title: Gradient-Free Post-Hoc Explainability Using Distillation Aided Learnable Approach
Authors: Debarpan Bhattacharya, Amir H Poorjam, Deepak Mittal, and Sriram Ganapathy
Published in: IEEE Journal of Selected Topics in Signal Processing, 2025

Paper link: https://arxiv.org/abs/2409.11123