Aug 27, 2021

How To Detect Silent Failure in Machine Learning Models

Let’s say you have a machine learning model that you think you can trust to make the right prediction almost every time. Using the predictions made from this model for your business, you no longer have to deal with so much guesswork, and your company thrives. It’s easy to assume the model will work well forever, but that’s a misconception..

Models can and will deteriorate over time because the world around them will always change. If you’re not regularly feeding your model relevant and updated data, it will make a lot of wrong predictions without you even realizing it. These errors can easily add up over time and you’ll find your company losing money – and fast.

To protect yourself and your business from these problems, it’s important to start and continue monitoring your models. Here are a few things to consider:


Why machine learning models fail

Most models use a dataset to make predictions. But as time passes, there are changes in the model’s environment. If the information that you used to train your model is outdated, then the predictions it makes won’t be as reliable.

These issues are besides things you would normally expect from a machine, like bugs in the system, feedback loops, and general noise.

As hard as it can be to detect machine learning failure, there are some ways to uncover its secondary causes. To find out if your model is failing, you can check for things like:


Data Drift

As we mentioned in a previous article, a data drift is a change in the data that is input into the model. It happens when the distribution of model inputs significantly changes over time. While it doesn’t necessarily imply model failure, it can sometimes be an indicator of it, so watch out!

Data drift happens when there are changes to the data that you’re feeding into the model. External factors usually cause this change. For example, your model may fail if you’re using it to make predictions with data that comes from a different population that the data it was trained on – like a different country.

The first and simplest way to solve data drift is to use a statistical test, like the Kolmogorov-Smirnov test. You can use it to compare the most recent production data to training set or to earlier production data. If the results are statistically different, it means that your model has data drift and may indicate failure.

If you want to look at all the variables at once, you can instead use an anomaly detection algorithm. This can help detect and compare the number of anomalies between two datasets. If your current dataset has more anomalies than the previous one, then it is likely that data drift is happening in your model.

The last method you can use to detect data drift is to use a generative adversarial neural network (GAN). With a GAN, you can mimic the training data and easily identify and capture all the characteristics of a dataset.


Concept Drift

A concept drift is the change in the decision boundary. If the decision boundary patterns of the training dataset and the current dataset are different, then it means the model has failed. It’s especially important to watch out for this because it almost always affects the business impact of your model/

There are a few things you can do to check for concept drift. First, if you have access to labels, you could do a correlation analysis between those and the inputs. If, while doing this, you find that the pattern between the inputs and the outputs has changed in time, this may mean concept drift is happening.

Another option is to create a new model and fit it to both the training and current datasets. If you notice they are significantly different, then that means there is concept drift.


Differences in Performance

The simplest way to detect model failure is to directly measure performance and compare it to previous datasets. By comparing metrics like precision, or accuracy over time, you can estimate change in model performance. If there is a significant difference, then this shows that the model is failing. However, this only works if you have access to labels.

If you don’t have access to labels, you can instead try to predict the current aggregate performance. This is a complex topic, so we won’t go into any details here.


While model failure can be scary, it is not an unsolvable problem. But just like any problem with ML, we’d be happy to help you out.


Continue reading

Our newsletter
Get great AI insights every month.
Leave your email address below and we'll keep you posted about all the great AI insights we have to offer.
No spam!