4.1.3. Intra-processing¶
Intra-processors are fairness processors that modify the model’s behavior without further training.
The supported methods are:
Modular Debiasing with Diff Subnetworks (Hauzenberger et al., 2023).
Entropy Attention Temperature (EAT) scaling (Zayed et al., 2023).
4.1.3.1. MoDDiffy¶
MoDDiffy (Hauzenberger et al., 2023) creates many sparse subnetworks to address bias for different attributes (gender, religion,…) through the idea of Diff prunning. Basically, they freeze the model parameters, \(\theta\), and train another network with parameters, \(\delta\), with a loss function that promotes accuracy, sparsity and debiasing:
- where:
\(\mathcal{L}_{\rho}^{task}\) represents the original loss of the downstream task with the new parameters.
\(\mathcal{L}_{\rho}^{0}\) promotes sparsity through a smooth approximation to the \(L_0\) norm of the new parameters by means of the hard-concrete distribution of parameters \((\log \alpha_{\rho}, 1)\) and hyper-parameters \(\gamma < 0, \zeta > 1\).
\(\mathcal{L}_{\rho}^{debias}\) debiases the outputs by approximating the mutual information of embeddings belonging to different demographic groups.
In particular the sparsity loss takes the form:
And the debiasing loss:
where \(\phi\) is a transformation kernel.
- class FairLangProc.algorithms.intraprocessors.modular.DiffPrunDebiasing(head: Module, encoder: Module, loss_fn: Callable, input_ids_A: Tensor, input_ids_B: Tensor, lambda_sparse: float = 1.0, lambda_bias: float = 1.0, bias_kernel: Callable | None = None, fixmask_init: bool = False, alpha_init: int | float | None = 5, structured_diff_prunning: bool | None = False, upper: float = 1.1, lower: float = -0.1)[source]
Implements differ pruning for bias mitigation in pretrained models.
Requires the implementation of the ‘_forward’ method, similar to ‘_ get_embedding’ in other classes in that it should compute the embedding given some inputs.
Example
>>> from FairLangProc.algorithms.intraprocessors import DiffPrunBERT >>> >>> class DiffPrunBERT(DiffPrunDebiasing): ... def _forward(self, input_ids, attention_mask=None, token_type_ids=None): ... outputs = self.encoder( ... input_ids = input_ids, ... attention_mask = attention_mask, ... token_type_ids = token_type_ids ... ) ... return outputs.last_hidden_state[:,0,:] >>> >>> tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased') >>> model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased') >>> >>> gendered_pairs = [("manager", "manageress"), ("nephew", "niece"), ("prince", "princess"), ("baron", "baroness")] >>> tokens_male = [words[0] for words in gendered_pairs] >>> tokens_female = [words[1] for words in gendered_pairs] >>> inputs_male = tokenizer(tokens_male, padding = True, return_tensors = "pt") >>> inputs_female = tokenizer(tokens_female, padding = True, return_tensors = "pt") >>> >>> def normalize_by_column(x: torch.Tensor, eps: float = 1e-8): ... mean = x.mean(dim=0, keepdim=True) ... std = x.std(dim=0, keepdim=True) ... return (x - mean) / (std + eps) >>> >>> ModularDebiasingBERT = DiffPrunBERT( ... head = model.classifier, ... encoder = model.bert, ... loss_fn = torch.nn.CrossEntropyLoss(), ... input_ids_A = inputs_male, ... input_ids_B = inputs_female, ... bias_kernel = normalize_by_column, ... upper = 10, ... lower = -0.001, ... lambda_bias = 0.5, ... lambda_sparse = 0.00001 ... ) >>> trainer = Trainer( ... model=ModularDebiasingBERT, ... args=training_args, ... train_dataset=train_dataset, ... eval_dataset=val_dataset, ... optimizers=( ... AdamW(ModularDebiasingBERT.parameters(), lr=1e-5, weight_decay=0.1), ... None ... ) ... ) >>> trainer.train() >>> results = trainer.evaluate() >>> print(results)
- __init__(head: Module, encoder: Module, loss_fn: Callable, input_ids_A: Tensor, input_ids_B: Tensor, lambda_sparse: float = 1.0, lambda_bias: float = 1.0, bias_kernel: Callable | None = None, fixmask_init: bool = False, alpha_init: int | float | None = 5, structured_diff_prunning: bool | None = False, upper: float = 1.1, lower: float = -0.1)[source]
Constructor of the DiffPrunDebiasing class.
- Parameters:
head (nn.Module) – Head used for the task at hand (classification, question answering,…).
encoder (nn.Module) – Pretrained model (e.g., BERT, GPT-2).
loss_fn (Callable) – Loss function.
input_ids_A (torch.Tensor) – Tensor with ids of text with demographic information of group A.
input_ids_B (torch.Tensor) – Tensor with ids of text with demographic information of group B.
lambda_sparse (float) – Weight for sparsity loss.
lambda_bias (float) – Weight for bias mitigation loss.
bias_kernel (Callable) – Kernel for the embeddings of the bias loss. If None, defaults to the identity.
fixmask_init (bool) – If true, uses DiffWeightFixmask (i.e. only masks) instead of DiffWeightFinetune (i.e. smooth pruning).
alpha_init (Optional[Union[int, float]]) – Initialization value for the log alpha parameters.
structured_diff_prunning (Optional[bool]) – If true, adds a group structure to the diff pruning process (see DiffWeightFinetune)
upper (float) – Parameter for concrete relaxation (has to be > 1).
lower (float) – Parameter for concrete relaxation (has to be < 0).
4.1.3.2. EAT¶
EAT scaling (Zayed et al., 2023) modifies the distribution of the attention scores with a temperature-related parameter, \(\beta \in [0, \infty)\):
the idea being that when \(\beta >> 1\) the head attends only to the tokens with biggest scores while \(\beta \approx 0\) forces the head to attend equally to all tokens. When \(\beta = 1\) the attention head remains unmodified.
- FairLangProc.algorithms.intraprocessors.redistribution.add_EAT_hook(model: Module, beta: float = 1.1)[source]
Insert hook to modify attention scores.
- Parameters:
model (nn.Module) – Model whose attention scores we want to modify.
beta (float) – Temperature parameter.
Example
>>> from FairLangProc.algorithms.intraprocessors import add_EAT_hook >>> >>> EATBert = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased') >>> beta = 1.5 >>> add_EAT_hook(model=EATBert, beta=beta) >>> >>> trainer = Trainer( ... model=EATBert, ... args=training_args, ... train_dataset=train_dataset, ... eval_dataset=val_dataset, ... optimizers=( ... AdamW(EATBert.parameters(), lr=1e-5, weight_decay=0.1), ... None ... ) ... ) >>> results = trainer.evaluate() >>> print(results)