=============== Getting Started =============== This chapter provides an introduction to FairLangProc, including installation, quick start examples, and a comparison with other fairness tools. .. toctree:: :maxdepth: 2 Why FairLangProc? ----------------- .. rst-class:: lead A unified framework for fairness in NLP with comprehensive coverage of datasets, metrics, and algorithms. There are several tools available for bias mitigation in the fair ML landscape: * **AI Fairness 360 (aif360)** - The main inspiration for this project. Provides bias mitigation methods for tabular data but lacks NLP-specific features. * **Fairlearn** - Offers fairness tools but is not focused on NLP. * **AllenNLP** - Provides embedding metrics and debiasing methods but development has been discontinued. * **Dbias** - Offers a pipeline for bias evaluation but uses a custom methodology. * **LangFair** - Allows generating custom datasets for fairness evaluation but lacks debiasing algorithms. * **langtest** - Comprehensive evaluation library but with limited bias mitigation capabilities. FairLangProc addresses these limitations by providing: * **NLP-First Design** - Specifically designed for text data with HuggingFace transformers integration * **Comprehensive Coverage** - 13+ benchmark datasets, 8+ fairness metrics, and 9+ debiasing algorithms * **Pre/In/Intra-Processing** - Support for all stages of the ML pipeline * **Unified Interface** - Consistent API across all modules * **Active Development** Overview -------- .. list-table:: FairLangProc Main Functionalities :header-rows: 1 :widths: 33 33 34 * - **DATASETS** - **METRICS** - **ALGORITHMS** * - BBQ - WEAT - Pre-processors: * - BEC-Pro - LPBS - - CDA * - BOLD - CBS - - Projection * - BUG - CPS - - BLIND * - CrowS-Pairs - AUL - In-processors: * - GAP - HONEST - - ADELE * - HolisticBias - DR - - EAR * - HONEST - SA - - Regularizers * - StereoSet - - Intra-processors: * - UnQover - - - EAT * - WinoBias - - - MoDDiffy * - WinoBias+ - - * - WinoGender - - .. raw:: pdf \newpage Installation ------------ Prerequisites ~~~~~~~~~~~~~ FairLangProc requires: * Python >= 3.10 * numpy >= 2.2.4 * pandas >= 2.2.3 * scikit-learn >= 1.6.1 * torch >= 2.6.0 * transformers >= 4.47.1 * datasets >= 3.4.1 * adapter-transformers >= 1.1.0 * pytest >= 8.4.1 Install from PyPI ~~~~~~~~~~~~~~~~~ .. code-block:: bash pip install FairLangProc Install from source ~~~~~~~~~~~~~~~~~~~ .. code-block:: bash git clone https://github.com/arturo-perez-peralta/FairLangProc cd FairLangProc pip install -e . Quick Start ----------- Load a dataset for bias evaluation: .. code-block:: python from FairLangProc.datasets import BiasDataLoader ageBBQ = BiasDataLoader(dataset = 'BBQ', config = 'Age') print(ageBBQ['data'][0]) Measure bias using common metrics: .. code-block:: python from FairLangProc.metrics import CBS model = AutoModelForMaskedLM.from_pretrained('bert-base-uncased') tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased') target_words = [("John", "Mamadouk", "Liu"), ("white", "black", "asian"), ("white", "black", "asian")] sentences = ["[MASK] is a [MASK]", "The [MASK] kid got [MASK] results", "The [MASK] kid wanted to be a [MASK]"] fill_words = ["engineer", "outstanding", "doctor"] mask_indices = [0, 1, 1] CBSscore = CBS( model = model, tokenizer = tokenizer, sentences = sentences, target_words = target_words, fill_words = fill_words, mask_indices = mask_indices ) Mitigate bias with an algorithm: .. code-block:: python from transformers import BertForSequenceClassification from FairLangProc.algorithms.inprocessors import EARModel model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2) EARRegularizer = EARModel( model = model, ear_reg_strength = 0.01 ) Further Reading --------------- For a more in-depth reading of the methodology and implementation consult: - `GitHub Repository `_ - Source code and issues - `arXiv Paper `_ - Academic publication