4.1.1. Pre-processing

Pre-processors are fairness processors that modify the model inputs.

The supported methods are:

4.1.1.1. Counterfactual Data Augmentation (CDA)

Data augmentation is the process of curating or upsampling the dataset to obtain a more representative distribution to train the model on. In particular, Counterfactual Data Augmentation (CDA) (Webster et al. 2020) consists of flipping words with demographic information while preserving semantic correctness. This procedure can be one-sided and discard the original sentence or two-sided to consider both the original and its augmented version.

FairLangProc.algorithms.preprocessors.augmentation.CDA(batch: dict, pairs: dict[str, str], columns: list[str] | None = None, bidirectional: bool = True) dict[source]

Perform CDA on a batch of training instances.

Parameters:
  • batch (dict) – Batch of training instances

  • pairs (dict) – Dictionary of counterfactual pairs

  • columns (list[str]) – List of columns on which CDA should be performed. If none, applies CDA to all columns.

  • bidirectional (bool) – If true, applies bidirectional CDA (preserves original training instance). If false, deletes original training instance.

Returns:

  • output (dict) – Augmented training instance.

  • modified (dict) – Whether or not the training instance was augmented.

Example

>>> from FairLangProc.algorithms.preprocessors import CDA
>>> gendered_pairs = [('he', 'she'), ('him', 'her'), ('his', 'hers'), ('actor', 'actress'), ('priest', 'nun'),
... ('father', 'mother'), ('dad', 'mom'), ('daddy', 'mommy'), ('waiter', 'waitress'), ('James', 'Jane')]
>>>
>>> cda_train = Dataset.from_dict(CDA(imdb['train'][:], pairs = dict(gendered_pairs)))
>>> train_CDA = cda_train.map(tokenize_function, batched=True)
>>> train_CDA.set_format(type="torch", columns=["input_ids", "attention_mask", "label"])

4.1.1.2. Projection-based debiasing

Projection-based debiasing methods (Bolukbasi et al., 2023) operate on latent space, looking to identify a bias subspace given by an orthogonal basis, \(\{v_i\}_{i=1}^{n_{bias}}\). Then, the hidden representation of any input can be debiased by removing its projection onto this space, formally

\[h_{proj} = h - \sum_{i = 1}^{n_{bias} } \langle h, v_i \rangle \, v_i.\]

This can be done either at the word or sentence level. In either case the bias subspace is generally identified through PCA, and usually its dimension is one, resulting in the construction of a bias direction.

4.1.1.3. BLIND debiasing

BLIND (Orgad et al., 2023) is a debiasing procedure based on a complementary classifier \(g_{B} : \mathbb{R}^{d_L} \longrightarrow \mathbb{R}\) with parameters \(\theta_{B}\), that takes the hidden representation vector as inputs and outputs the success probability of the model head for the downstream task. This probability is then used as a weight for said observation whose magnitude is controlled through a hyper-parameter \(\gamma \geq 0\):

\[\mathcal{L}_{BLIND} = \left(1 - \sigma \left( g_{B}(h; \theta_{B} ) \right) \right)^{\gamma} \mathcal{L}^{task}(\hat{y}, y).\]

The term \(\sigma(g_{B}(h;\theta_B))\) represents the model success probability for the downstream task: the bigger it is the less weight the observation has, while the smaller it is the more weight it carries. This forces the model to pay special attention to observations with low probability of success during training. Note that when \(\gamma = 0\) the original loss function is restored, while \(\gamma >> 1\) exacerbates the effect of the reweighting.