The Simpson’s Paradox is a phenomenon in statistics illustrating how easy it is to misinterpret data. (Click to Tweet!) It occurs mainly in descriptive and diagnostic analytics (see our blog on the different types of analytics) where an analyst may jump to a conclusion driven by motivated reasoning and not by objectively assessing the evidence.
This blog is part of a series of blogs on how to avoid the logical fallacies and cognitive biases in data science.
Today we look at a famous example of the Simpson’s paradox, and that is a study from the University of Berkley where admission records appeared to show that males are favoured over females. When breaking it down by department, it seems that there was no noticeable difference in male over female admissions.
Let’s have a look at the numbers. The table shows that the male applicants have a 47% success rate compared to 36% for female applicants. A rash conclusion, as admissions were managed at the departmental level, would be that male applicants are being favoured above female applicants.
When one assesses the statistics at department level a different picture emerges. Here not only does it appear that women enjoyed a higher success rate in four of the six departments, but the biggest differences between genders favours women (department I and VI).
So what’s going on here?
The first thing one needs to do is to navigate away from the percentages and look specifically at the numbers. Department III shows that a much higher proportion of women applied to a department with a relatively low success rate whereas the men did not. Conversely Department I had a high proportion of men apply with a relatively good success rate, but very few women applied despite a very high success rate.
The overall conclusion was the fact that women applied in larger proportion to the departments where it was difficult to get in and in lower proportions to departments where it was easy to get in. There was no departmental bias it seems, just application biases.
Essential tips to avoid the Simpson Paradox
- Try and understand the base data (numbers) – i.e. avoid relying solely on percentages.
- Do not be swayed easily in concluding what you (or your boss) want to see in the numbers (motivated reasoning), instead conduct the full analytical exercise (try and blind/double-blind your analysis if you can)
- Read up on as many statistical paradoxes as you can. Your awareness of the statistical pitfalls will better prepare you to avoid them in your analysis. (Click to Tweet!)