Detect silent model failure with our Open Source python library

NannyML - OSS Python library for detecting silent ML model failure | Product Hunt
Towards AI
PyData
KD nuggets
Data Talks Club
Towards Data Science

Is not knowing model performance causing you sleepless nights?

NannyML empowers data scientists to detect and understand silent model failure, so you can end these worries in minutes!

Performance estimation and monitoring

NannyML estimates model performance using an algorithm called Confidence-Based Performance Estimation, researched by NannyML core contributors, so you can detect real-world performance drop before you otherwise would.

It can also track the realised performance of your model once targets are available.

Data drift detection

NannyML uses Data Reconstruction with PCA to detect multivariate data drift, and for univariate data drift it uses tests that measure the observed drift, and a p-value that shows how likely it would be to get the observed sample if there was no drift.

Model output drift uses the same univariate methodology as for a continuous feature. All of these together help you to identify what is changing in your data and your model.

Target drift is monitored by calculating the mean occurrence of positive events as well as the chi-squared statistic from the 2-sample Chi-Squared test of the target values for each chunk.

ranker = nml.Ranker.by(['alert_count', 'performance_drops', 'feature_importance'])
ranked_features = ranker.rank(drift_results, model_metadata, only_drifting=True)
ranked_features

feature number_of_alerts with_performance_drops feature_importance
0 car_value 2 2 1
1 salary_range 2 1 3
2 driver_tenure 3 0 15
3 loan_length 2 0 9

Intelligent alerting

Because NannyML can estimate performance, it allows you to get alerts on data drift that impact performance. These are tailored to draw your attention to statistically significant events, helping avoid alert fatigue.

You can use our ranker to list changes according to their significance and likely impact, allowing you to prioritise problems. This means you can link drops in performance to data drift that causes it.

Simple and secure

NannyML can be set up in seconds on your own local or cloud environments, ensuring your data stays in your control and model monitoring fully complies with your security policies. You can use Pip, Conda, or run it as a CLI.

$ pip install nannyml

Integrates with any classification or regression model, regardless of language or format. More problem types will be supported in future.

Screenshot of a jupyter notebook running NannyML

Example usage

1. Training

Train yourself a model that performs daily predictions of whether your customers will churn in the next three months.

2. Validation

Make sure the model works the way you need it to with the data you have. You can use NannyML here to check the drift in the data you use for validation, and calculate the performance of your trained model on that data.

3. Deployment

Get your model into your production environment, collecting new data and making new predictions. You can deploy this anywhere, knowing that NannyML doesn't need access to your model to monitor it, only the data.

4. Monitoring

Run outputs of your model through NannyML weekly, estimating performance in production. You can also detect any drift on your outputs and features - individually and as a whole.

5. Post-deployment data science

If NannyML estimates performance problems you can use the tools within NannyML itself to compare different metrics, including drift, and help identify what could be causing the problems. Similarly, if you detect data drift, you can use NannyML to see if the drift is likely to impact the performance.

6. Evaluation

After you get your targets, you can evaluate NannyML's estimations, and your model's realised performance, by calculating the metrics most important to you.

How it fits into your workflow

NannyML turns the machine learning flow into a cycle, empowering data scientists to do meaningful and informed post-deployment data science to monitor and improve models in production through iterative deployments.

A diagram of where NannyML fits into the MLOps process, bridging prediction services and data analysis by providing performance monitoring and post-deployment data science

Based on work by Google

What do people think of NannyML?

A review of NannyML by Jakub Sliz reading - A very clear vision for the product. Can't wait to fully use it in measuring out NGO disinformation AI model's performance. Love the documentation and the very simple setup. Must have for all operating in an ecosystem where data is a rare asset.
A review of NannyML by Jakub Sliz reading - A very clear vision for the product. Can't wait to fully use it in measuring out NGO disinformation AI model's performance. Love the documentation and the very simple setup. Must have for all operating in an ecosystem where data is a rare asset.
A review of NannyML by Bart Vandekerckhove reading - Love the product. Saved us a lot of money from failed predictions
A review of NannyML by Johannes Hotter reading - NannyML is just amazing; super easy to use, giving me all the insights I need to sleep well knowing my models still work in production ;)

Comprehensive documentation

Our documentation includes walkthroughs and explanations of how the library works, and how you can use it:

Read The Docs

Need more info?

If you have any questions about using NannyML, we are happy to help you start monitoring asap!

The team will go through your use cases, and ensure you can cover your basic ML monitoring needs using our library.

Feature comparison

Choosing the right tool for you is hard. Here's a starting point for comparing some different ML monitoring solutions.

This is accurate to the best of our knowledge as of September 2022. If you notice anything incorrect, please get in touch!

Feature
NannyML
Evidently
AWS Sagemaker Model Monitor
Open source
Yes
Yes
No
License type
Apache-2.0 license
Apache-2.0 license
Enterprise licence
Ease of integration
Medium complexity
Medium complexity
n/a
Supported frameworks
Model agnostic
Model agnostic
Sklearn, pytorch, Tensorflow, MXnet
Binary classification support
Yes
Yes
Yes
Multiclass classification support
Yes
Yes
Yes
Regression support
Yes
Yes
Yes
Tabular data support
Yes
Yes
Yes
Text data support
Planned for the future
No
With some work
Image data support
Planned for the future
No
No
Performance monitoring with targets
Yes
Yes
Yes
Estimated performance monitoring without targets
Yes
No
No
Post-deployment data science
Yes
No
No
Data quality check
Planned for the future
Yes
Yes
Categorical and numerical target drift
Yes
Yes
No
Covariate drift detection
Yes
Yes
Yes
Multivariate drift detection
Yes, using PCA algorithm
No
No
Model output drift detection
Yes
Yes
Yes
Target drift detection
Yes
Yes
Yes
Concept drift detection
Planned for the future
No
No
Feature distributions
Yes
Yes
Yes
Bias identification
No
Yes
Yes
Integrations
Grafana and Prometheus planned for the future
Grafana, Airflow and MLFlow
Amazon SageMaker Clarify, Amazon CloudWatch
Deployable as
Library and CLI
Library and CLI
n/a
Interactive dashboards
Planned for the future
Yes
Via Other AWS services
HTML reports
Planned for the future
Yes
Via Other AWS services
Visualisations
Customisable outputs based on Plotly
Via Amazon QuickSight, Tensorboard, and Tableau
Via Amazon SageMaker Studio
Github

Check out the open source code in our Github, as well as our detailed readme, code examples and other guides.

Slack

Get involved with conversations as part of our community of users, contributors and friends, in our Slack.

Newsletter

Sign up to our newsletter to be kept up to date with future developments of NannyML, and other related data science news we find interesting.

Documentation

Read our documentation to find extensive tutorials on how to use NannyML, and fully-detailed deep dives into how it all works.