Debiasing Diffusion Models via Score Guidance

Piyush Tiwary, Prabhav Verma, Prathosh A.P.
Indian Institute of Science, Bengaluru, India

Abstract

With the increasing use of Diffusion Models (DMs) in everyday applications, it is very important to ensure that these models are fair towards various demographic/societal groups. However, due to several reasons DMs inherit biases towards specific gender, race and community, which can perpetuate and amplify societal inequities. Hence, it is important to debias DMs. Previous debiasing approaches require additional reference data, model fine-tuning, or auxiliary classifier training - each of which incur additional cost. In this work, we provide a training-free inference-time method for debiasing diffusion models. First, we provide a theoretical explanation for the cause of biases inhibited by DMs. Specifically, we show that the unconditional score predicted by the denoiser can be expressed as a convex combination of conditional scores corresponding to the attributes under consideration. We then argue that the weights allocated to underrepresented attributes are less which leads to domination of other attributes in overall score function. Building on this, we propose a score-guidance method that adheres to a user provided reference distribution for generation. Moreover, we show that this score guidance can be achieved via different modalities like 'text' and 'exemplar images'. To our knowledge, our method is the first to provide a debiasing framework that can utilize different modalities for diffusion models. We demonstrate the effectiveness of our method across various attributes on both unconditional and conditional text-based diffusion models, including Stable Diffusion.

Theoretical Basis of Bias

Figure 1: Illustration of why Diffusion Models inherit bias. Underrepresented attributes naturally receive lower weight $p(a_i)$ locally, leading to dominated generations by overrepresented categories. (Placeholder image)

Generative Diffusion Models routinely inherit biases towards specific gender, race, and community from training distributions. By leveraging Tweedie's formula, we demonstrate that the (Stein's) score function provided by diffusion models can be expressed precisely as a weighted average of conditional scores. Specifically, the unconditional score is given by a convex combination:

$$ \begin{align} \nabla\log p(x_t) = \sum_{a_i} p(a_i)\nabla\log p(x_t|a_i) \end{align} $$

From this equation, we can deduct that the attribute influence in the score function strictly follows the proportion of $p(a_i)$ learned by the pre-trained diffusion model. Because the weights inherently allocated to underrepresented attributes are significantly smaller, other attributes naturally dominate the overall score function. This disparity in the parameter gradients becomes the intrinsic source of generation bias.

Score Guidance Methodology

Figure 2: Overview of the Score Guidance framework. (a) Standard generation where one mode dominates, causing under-representation of others. (b) Our proposed method tags and guides samples toward a user-specified reference distribution via H-space modulation.

Our proposed approach serves as a training-free inference-time method for robust debiasing. Unlike previous works that necessitate auxiliary classifier training, model fine-tuning, or extensive reference data, our framework directly adapts the predicted noise score to align with a user-provided reference distribution.

The method decomposes debiasing into two components: (a) sample tagging to maintain attributes in the desired proportions $p^a_{\text{ref}}$, and (b) score guidance which modifies the predicted denoised estimate $\hat{x}_{0|t}$ during a selected time window $\mathcal{T}$. We operate in the bottleneck H-space of the UNet-based denoiser, which is more semantically meaningful than the image space.

SG-Text: Text-Based Debiasing

Employs off-the-shelf pre-trained CLIP to classify the estimated un-noised samples $\hat{x}_{0|t}$. Samples are tagged by computing the cosine similarity between CLIP embeddings of attribute text descriptions $t_i$ and the predicted clean image $\hat{x}_{0|t_s}$. The H-space vectors are then updated using the gradient of CLIP similarity:

$$ \boxed{\quad h^{(i)} = h^{(i)} - \gamma \, \nabla_{h^{(i)}} \left(1 - \frac{\text{clip}(t^{(i)}) \cdot \text{clip}(\hat{x}^{(i)}_{0|t})}{|\text{clip}(t^{(i)})| \, |\text{clip}(\hat{x}^{(i)}_{0|t})|} \right)\quad} $$

where $\gamma$ is the guidance strength, $t^{(i)}$ is the attribute text assigned during tagging, and the update is applied $M$ times per timestep within the guidance window $\mathcal{T}$.

SG-Exemplar: Exemplar-Based Debiasing

Requires only a small set (e.g. 8 samples) of visual exemplars per attribute class, which are first inverted via DDIM to obtain anchor points $\bar{e}^{(j)}_{0|t}$ — estimates of the conditional expectation $\mathbb{E}[X_0 | X_t, Y = a_j]$. After tagging samples based on $\ell_2$-distance from the anchors, the predicted denoised sample is guided to remain within an $r$-ball of the anchor.

Image-space update:

$$ \hat{x}^{(i)}_{0|t} = \hat{x}^{(i)}_{0|t} - \left(\hat{x}^{(i)}_{0|t} - \bar{e}^{(j)}_{0|t}\right)\left(1 - \frac{r}{\|\hat{x}^{(i)}_{0|t} - \bar{e}^{(j)}_{0|t}\|}\right) $$

However, instead of updating $\hat{x}^{(i)}_{0|t}$ directly, we update the associated H-space vectors:

H-space update:

$$ \boxed{\quad h^{(i)} = h^{(i)} - \gamma \, \nabla_{h^{(i)}} \left(\hat{x}^{(i)}_{0|t} - \bar{e}^{(j)}_{0|t}\right)\left(1 - \frac{r}{\|\hat{x}^{(i)}_{0|t} - \bar{e}^{(j)}_{0|t}\|}\right)\quad} $$

where $\gamma$ is the guidance strength. This update is applied $M$ times per timestep within the guidance window $\mathcal{T}$. The parameter $r$ controls the trade-off between diversity and accuracy. A formal guarantee (Theorem 3.1) ensures that the generated samples satisfy $\|\mathbb{E}[X_0] - \mathbb{E}[X|Y=a]\| \leq r$ almost surely.

Results

We conduct comprehensive quantitative and qualitative evaluations measuring Fairness Discrepancy (FD) alongside Fréchet Inception Distance (FID) for image validation across diverse characteristics (gender, race, eyeglasses, age distributions, etc.).

Our SG-Exemplar pipeline routinely demonstrated state-of-the-art unbiased balance. For unconditional setups (e.g. the P2 parameterization on CelebA-HQ), SG-Exemplar bounded FD near zero (e.g. 0.001 on Gender) while registering the best FID scores out of competing guidelines (FID=34.61).

Evaluation extending to heavily-conditional models like Stable Diffusion v1.5, v2.0, and SDXL targeting inherently biased generative structures directly (such as professions like Doctors and Firefighters) further highlighted that image fidelity is strictly preserved, and demographic parity is smoothly achieved without retraining latency penalties. Both single attribute and complex multi-attribute joint distributions benefit immensely from this localized gradient injection.

CelebA Results

Visualization of balanced generation on 'eyeglasses' and 'gender' using different baselines and our method. The samples with eyeglasses and male gender are shown in orange colored border.

Visual results for balanced generation on multi-class attributes from Unconditional Diffusion using SG(T) and SG(E).

Visual results for balanced generation on multiple attributes for Unconditional Diffusion using SG(T) and SG(E).

Stable Diffusion 1.5 Results

Visualizations of 'gender-balanced' samples for different profession from Stable Diffusion using SG (T) and SG (E).

Visualizations of multi attribute ('race' and 'gender') debiasing for different profession from Stable Diffusion using SG (T) and SG (E).

Stable Diffusion 2.0 & SDXL Results

Visualizations of debiasing for 'doctor' and 'firefighter' from Stable Diffusion v2.0 using SG (T) and SG (E).

Visualizations of debiasing for 'doctor' and 'firefighter' from Stable Diffusion XL using SG (T) and SG (E).

BibTeX

@article{tiwary2025debiasing,
  title={Debiasing Diffusion Models via Score Guidance},
  author={Tiwary, Piyush and Verma, Prabhav and AP, Prathosh},
  journal={Transactions on Machine Learning Research},
  year={2026},
  url={https://openreview.net/forum?id=vAz8xUHyTe}
}

Piyush Tiwary, Prabhav Verma, Prathosh A.P. Indian Institute of Science, Bengaluru, India