Since machine learning models are statistical models, they naturally leave themselves open to potential errors. For example, Apple card’s fair lending fiasco brought into question the inherent discrimination in loan approval algorithms while a project funded by the UK government that used AI to predict gun and knife crime turned out to be wildly inaccurate.

For people to trust machine learning models, we need explanations. It makes sense for a loan to be rejected due to low income, but if a loan gets rejected based on an applicant's zip code, this might indicate there’s bias in the model, i.e, it can favour more wealthy areas

When choosing a machine learning algorithm, there’s usually a tradeoff between the algorithm’s interpretability and its accuracy. Traditional methods like decision trees and linear regression can be directly explained, but their ability to provide accurate predictions is limited. More modern methods such as Random Forests and Neural Networks give better predictions but are more difficult to interpret.

In the last few years, we've seen great advances in the interpretation of machine learning models with methods like Lime and SHAP. While these methods do require some background, analyzing the underlying data can offer a simple and intuitive interpretation. For this, we first need to understand how humans reason.

Let’s think about the common example of the rooster’s crow: If you grew up in the countryside, you might know that roosters always crow before the sun rises. Can we infer that the rooster’s crow makes the sun rise? It’s clear that the answer is no. But, why?

Humans have a mental model of reality. We know that if the rooster doesn't crow, the sun rises anyway. This type of reasoning is called counterfactual.

Counterfactual Reasoning

This is the common way in which people make sense of reality. Counterfactual reasoning cannot be scientifically proven. Descartes’ demon, or the idea of methodological skepticism, illustrates this idea: According to this concept, if Event B happens right after Event A, you can never be sure that there isn’t some  demon that causes B to happen right after A. The scientific field historically refrained from formalizing any discussion on causality. But, more recently, efforts have been made to create a scientific language that helps us better understand cause and effect. For additional information, be sure to read “The Book of Why” by Judea Pearl, a prominent computer science researcher and philosopher.

Using counterfactuals

At my company, we have predictive models aimed at an assessment of customers' risk when they apply for a loan. The model uses historical data in a tabular format, in which each customer has a list of meaningful features like payment history, income and incorporation date. Using this data, we predict the customet's level of risk and divide it into six different risk groups (or buckets). We interpret the model's predictions using both local and global explanations, then we use counterfactual analysis to explain our predictions to the business stakeholders.

Local explanations are aimed to explain a single prediction. We replace each feature’s value with the median in the representative population and display the feature that caused the largest change in score through text. In the following example, the third feature is “successful repayments,” and its median is 0. We calculate new predictions while replacing the original feature’s value with the new value (the median).

Customer_1 had their prediction changed to a reduced risk, and we can devise a short explanation. A higher number of successful repayments improved the customer’s risk level. Or in its more detailed version: The customer had 3 successful repayments compared to a median of 0 in the population. This caused the risk level to improve from level D to E.

Global explanations

Global explanations are aimed to explain the features’ direction in the model as a whole. An individual feature value is replaced with one extreme value. For example, this value can be the 95th percentile - i.e., almost the largest value in the sample (95% of the values are smaller than it).

The changes in the scores’ distribution are calculated and visualized in the chart below. The figure shows the change in the customer's risk level when increasing the value to the 95th percentile.

Bucket change - increased feature value

When increasing the first listed feature (length of delay in payments) to the 95th percentile, a large portion of the customers have their risk level deteriorate one or more levels. A person who reviews this behaviour can easily accept that a delay in payments is expected to cause a worse risk level.

The second feature, monthly balance increase, has a combined effect - a small percentage of the customer's have their risk level deteriorate, while a larger percentage have their risk level improve. This combined effect might indicate there’s some interaction between features, although that is not something that can be directly explained through this method.

The third feature, years since incorporation, has a positive effect on the customer's risk level when increasing it to the 95% percentile. Here too, it can be easy to accept that businesses that have been around for longer periods are likely to be more stable and therefore present less risk.

Unlike many other reasoning methods, the counterfactual approach allows for simple and intuitive data explanations that anyone can understand, which can increase the trust we have in machine learning models.

Written by Nathalie Hauser, Manager, Data Science at Bluevine