With more and more important decision making processes being automated by machine learning systems, there have been dozens of news stories about all the ways in which the underlying AI models were racist, sexist, transphobic or exibit any number of other undesirable biases. Since this has by now been known for years it raises the question: is this still true? Once we know the ways in which an AI's decisions are unfair, surely we can simply fix it, right? Unfortunately, it turns out that even agreeing on what we mean by fairness is an incredibly hard problem.
In order to kick off our quest for fairness, it will be useful to just think of a very simple task. Think of making some binary decision about a specific person. For example, deciding if we should hire a job applicant, grant somebody credit or if a defendant should be let out on parole.
One obvious thing that most people will agree on right away is that this classifier should not discriminate based on attributes that we think are unimportant for the decision, like skin color or gender. But if these things don't matter, why would a coldly calculating machine learning system learn to unfairly discriminate in the first place? First of all, there really could be informative correlations that the system picks up on. For example, if some racial group has, on average, a lower level of education, then the system may learn this correlation and use race as a factor in its decision making. But even if this weren't the case, then the training dataset may simply have unnoticed biases.
For example, the AI that Amazon developed for judging job applicants was trained on past successful applications. Since those happened to be dominated by males, the machine learning algorithm learned to discriminate against women.
The solution to this seems quite easy and obvious: just don't show the protected attributes to the AI!
Definition 1: A fair algorithm simply doesn't look at which group you belong to.
Unfortunately, this doesn't work at all. The problem is that, even if we don't tell the AI somebody's gender or racial identity, it can often reliably infer such things from other attributes that it then uses as a proxy for discriminating. For example, in many areas of the US someone's race can be inferred quite reliably by their zip code because neighborhoods are still somewhat segregated, in practice.
So now we basically have to treat the decision making process as a black box because the AI might be picking up on features that we cannot completely understand. This means we can only assess fairness or a lack thereof based on the outcome of our algorithms decision making. Okay, so how about we define fairness like this:
Definition 2: A fair algorithm, on average, makes the same predictions across different groups.
For example, maybe it should hire male and female applicants at the same rate or grant loans at the same rate across racial groups. However, if the underlying statistics of these groups differ at all in any of the metrics that are relevant to the classification task, then this again leads to some puzzling conclusions. For example, in the US blacks are more likely to default on loans than other racial groups. Now imagine you had a perfect AI system that can 100% accurately predict who is going to default on a loan. Using this perfect AI would be considered unfair according to this definition and the only way to make it fair is by either granting loans to people who are guaranteed to default and/or denying loans to people who would be certain to pay them back. In the real world, where such perfect systems don't exist, this translates to a system making errors at different rates across groups. Many people would consider this to be quite unfair. Well then maybe the AI shouldn't necessarily make the same predictions across groups but instead simply have the same performance across groups?
Definition 3: A fair algorithm, on average, has the same error rate across different groups.
However, this formulation also immediately fails, because the system might make very different errors on different groups. In one of the most famous examples of algorithmic (un)fairness, precisely this problem occured. The machine learning system called COMPAS used a number of attributes such as employment status and history of drug usage in order to predict whether a jailed defendant who was awaiting trial was likely to reoffend. Investigative journalists at ProPublica found that it discriminated against black people.
If we look into the original dataset that ProPublica used, we can clearly see that black defendants consistently received higher risk scores by the system.
But recidivism rates among black defendants really are higher than among white defendants so ProPublica dug deeper. To them, the smoking gun was the following:
"In forecasting who would re-offend, the algorithm made mistakes with black and white defendants at roughly the same rate but in very different ways. The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants. White defendants were mislabeled as low risk more often than black defendants."
In other words, the model actually did satisfy Definition 3, i.e. it had the same error rate for black and white defendants, but the type the type of errors differed. ProPublica was implicitly assuming the following definitions of fairness:
Definition 4: A fair algorithm, on average, has the same false positive rate across different groups.
Definition 5: A fair algorithm, on average, has the same false negative rate across different groups.
The following table sums up their result. Again, it says that if you are black and wouldn't have reoffended (within two years) your probability of getting labeled high risk is much higher than the same probability for a white person who wouldn't reoffend. And similarly, a black person who would reoffend was less likely to be labeled low risk than a white person who would also reoffend.
|Labeled Higher Risk, But Didn't Re-Offend||23%||45%|
|Labeled Lower Risk, But Did Re-Offend||48%||28%|
Northpointe, the company that had developed COMPAS came back with the following argument. They showed that if a black person was labeled higher risk, their actual rate of recidivism was, in fact, higher than that of a white person who was deemed risky. And the same was true for black and white people that were deemed lower risk, as you can see in the table below:
|Labeled Higher Risk, But Didn't Re-Offend||41%||37%|
|Labeled Lower Risk, But Did Re-Offend||29%||35%|
If anything, the system could be said to be discriminating against white defendants. Both tables seem completely contradictory but they are both technically correct. The technical way to phrase Northpointe's implicit definition of fairness would be. the following:
Definition 6: A fair algorithm, on average, has the same positive predictive value across different groups.
Definition 7: A fair algorithm, on average, has the same negative predictive value across different groups.
It might already be mind-boggling how both analyses can be simultaneously true but the story gets even more confusing. Another analysis found that given a risk score by COMPAS (from 1 to 10), both whites and blacks had approximately the same rate of recidivism as shown in the plot below.
One could argue that the model was fair because it was well-calibrated.
Definition 8: A fair algorithm, on average, has the same calibration across different groups.
The fact that both Northpointe and ProPublica could be right at the same time nicely illustrates why fairness is so difficult to achieve both with and without machine learning. In fact, the COMPAS case is so famous that it has become a joke among fairness researchers that every single talk on fairness has to mention it.
Given that we cannot seem to agree on a single definition of fairness, couldn't we simply ask that our models satisfy all of them simultaneously? Actually, we can't! And I don't mean it's difficult. I mean it is literally mathematically provably impossible! As soon as the distributions across different groups are not the same, we simply cannot satisfy all of our fairness definitions (or even just more than a couple) simultaneously.
Of course, this has only scratched the surface. We really only looked at so-called group fairness, which focuses on observing group statistics across protected attributes like race and gender. But there are yet more definitions that we could write down. For example, individual fairness:
Definition 9: A fair algorithm treats similar people similarly, e.g. simply swapping a person's race should not change the result.
Once again, this seemingly reasonable definition can be mathematically shown to be in conflict with other notions of fairness. To make matters worse yet again, in practice, even striving to meet a single fairness definition can be in conflict with achieving high accuracy on the original task.
And all of these complications were already present in our simple example of binary classification. But there exist far more complex tasks. What about the fairness of generative models? Should an image generator show 50% and 50% women when prompted to draw a group of physicists? How should translation models deal with occupations that are stereotypically male or female? And then what about situations where the ground truth is not actually fully knowable? For example, in the COMPAS example black people had higher rates of recidivism, but how much of that is actually because of higher rates of crime and how much is because they are more likely to live in more heavily policed neighborhoods? Ultimately, the issue of fairness is far more complicated than any one (or even all) of our definitions can capture.
Clearly, saying algorithms should be fair is easy, but making them so that no one can call them unfair is not only hard, it's impossible. So should we just give up? Well, no. While it is not possible to make algorithms that are completely fair, it is certainly possible to make algorithms that are completely unfair and this needs to be avoided. My hope is that especially the people who ostensibly care most about ethics and fairness in AI do their homework and start tracking down models that really do highlight discrimination rather than cherry-picking exactly the fairness metric that supports the flashiest headline. Ultimately, what makes fairness in AI so hard is not that AI researchers are malignent or incompetent or even necessarily data bias. The hardest thing is that we cannot agree on what we mean by fairness. And if thousands of years of moral philosophy hasn't accomplished this, I doubt that we can solve it now.
(If you are interested in further exploring topic, I really recommend watching this excellent talk on the many faces of fairness in machine learning.)