Fair Machine Learning

By this point, it is well known that machine learning algorithms reflect, and even amplify, existing societal biases. For example:

Job search ads for highly paid positions are less likely to be presented to women [1].
Searches for distinctively Black-sounding names are more likely to trigger ads for arrest records [2].
Image searches for professions such as CEO produce fewer images of women [3].
Facial recognition systems (increasingly used in law enforcement and border control) perform worse on recognizing faces of women and Black individuals [4].
Natural language processing algorithms ence language in gendered ways [5].
Health care prediction algorithms suggest spending more resources on White patients than Black [6].

Machine learning offers numerous benefits in supporting decision making that, for example, allocates resources and opportunities that are critical to people’s life chances (like the early detection of cancer). However, machine learning relies heavily on data, making it not as objective as it is made out to be. The data provided to the algorithm can be highly biased, especially when the records pertain to health, finance, and policing which have historically been oppressive to minority groups (e.g., women, Black, Asian, Hispanic). The most popular example in the fair machine learning literature is that of the COMPAS recidivism prediction tool which is used in courtrooms across the nation to predict if a criminal defendant will commit another offense within a two-year period. In 2016 it was found that this algorithm produces much higher false positive rates for Black people than White. This, unfortunately, creates a feedback loop in the sense that more Black people are arrested, causing more policing in majority Black neighborhoods, which leads to more Black people being arrested for petty crimes (or no crime at all).

Luckily, the machine learning community has opened its eyes to the fact that it causes harm to already marginalized communities and there is now an active fair machine learning community. There are three main approaches to mitigating the oppression caused by machine learning models: pre-processing, in-processing, and post-processing.

Pre-processing: Adujusting the training data

Pre-processing techniques for bias mitigation are concerned with changing the training data to be more fair before it is fed to the model for learning. Particular characteristics of the training data may directly cause the problematic performance of learned models and for this reason many techniques for pre-processing focus on modifying the training set to overcome of dataset imbalance.

Multiple approaches for overcoming dataset imbalance exist, such as: resampling, reweighting, flipping class labels across groups, and omitting sensitive variables or proxies. Other techniques consider learning direct modification and transformation functions that achieve a desired fairness constraint (of which several exist. See my posts on causal and statistical methods.). By modifying the training data, the outputs of the learned model can be made less oppressive to marginalized groups.

In-processing: Adjusting the training algorithm

With in-processing techniques, we want to create a classifier that is explicitly aware of our fairness goals. that is, in training the classifier, it is not enough to simply optimize for accuracy on the training data. Instead, we modigy the loss function to axxount simultaneously for our two goals: our model should be both accurate and fair.

This modification can be achieved in many ways such as using adversarial techniques, ensuring underlying representations are fair, or by enforcing constraints and regularizations. In each case, the goal is that the underlying classifier is directly taking fairness into consideration. As a result, the outcomes of that trained classifier will be less oppressive as compare to a classifier that knew nothing about fairness (with some caveats. See post on recommendations for fair machine learning practitioners here).

Post-processing: Adjusting the model output

Post-processing techniques aim only to adjust the outputs of a model and leave the underlying classifier and data untouched. The benefit here is appealing as using post-processing methods allow the model development teams to use any modeling algorithm they wish, and they don’t need to modify their algorithm or retrain a new model to make it more fair. Instead, post-processing methods center on the idea of adjusting the outputs of an unfair model such that the final outputs become fair. As an example, early works in this area have focused on modifying outcomes and thresholds in a group-specific manner.

Pre-processing: Adujusting the training data

In-processing: Adjusting the training algorithm

Post-processing: Adjusting the model output

Resources