AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data
Zengqun Zhao, Ziquan Liu, Yu Cao, Shaogang Gong, Ioannis Patras Centre for Multimodal AI, Queen Mary University of LondonCVPR 2025
Motivations
Recent advancements in text-to-image generative models showcase impressive data fidelit,
yet their potential for improving fairness through data expansion has
not been fully explored by the fairness community.
This raises the question: can AI-generated synthetic data play a crucial role in mitigating biases within machine learning models?
This work presents a comprehensive empirical investigation into whether fine-tuning on high-quality,
balanced generative data from a contemporary text-to-image model can counteract model biases caused by training on imbalanced real data.
We identify two key challenges in biascounter fine-tuning with synthetic data:
(1) A data-related challenge arising from linguistic ambiguity of the textual prompt and/or model misrepresentation,
which results in low-quality and low-diversity generated data.
(2) A model learning challenge caused by both a domain shift (synthetic vs. real) and a bias shift (unbiased vs. biased) between the real and the synthetic data.
Fine-tuning blindly on the synthetic will result in a model with decreased utility.
Methods

A selective fine-tuning model consisting of three parts: (1) Contextual Synthetic Data Generation (CSDG) for generating diverse images using GPT-4 generated prompts, (2) Selective Mask Generation (SMG) for creating a selection mask that determines which parameters are updated during fine-tuning, and (3) Selective Fine-Tuning (SFT) to enhance model fairness obtained from synthetic data whilst simultaneously to preserve model utility yielded from real data in pre-training.
Resutls
Comparisons to other methods on the CelebA dataset under settings of varied target and protected attributes.

Comparisons to other methods on the CelebA dataset (T=Smiling, P=Male) under settings of training set sizes.

Comparisons of varied training strategies on CelebA and UTKFace datasets.

Results on CelebA dataset (T=Smiling, P=Male) under settings of different prompt types and numbers.

Visualizaitons
T-SNE visualizations for the learned representations on CelebA (T=Smiling, P=Male).

Generated contextual images for CelebA with target attribute Smiling and protected attribute Young.

Generated contextual images for UTKFace with target attribute Female and protected attribute White.

BibTeX
If you find our work useful, please consider citing our paper:
@misc{oldfield2024mumoe,
title={AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data},
author={Zengqun Zhao and Ziquan Liu and Yu Cao and Shaogang Gong and Ioannis Patras},
year={2025},
eprint={},
archivePrefix={arXiv},
primaryClass={cs.CV}
}