Estimate your
Model Performance
in Production

Without Ground Truth. Open Source. For Data Scientists.
NannyML - OSS Python library for detecting silent ML model failure | Product Hunt

Used by Data Scientists at

Arise Health logoOE logoOE logoEphicient logoArise Health logoThe Paak logoThe Paak logoEphicient logoEphicient logoEphicient logo

Get started today

Use it the way you like:

In a Jupyter Notebook. In a CLI. In Docker.

$ pip install nannyml

$ conda install -c conda-forge nannyml

What nannyML does for you

A calculator, some money and a report
Business Value Estimation and Calculation
Learn More
Performance Estimator
Estimation and Calculation
Learn More
Two sets of data
Multivariate Covariate
Drift Detection
Learn More
Univariate Covariate
Drift Detection
Learn More
Target Drift
Learn More
Graph with prediction
Drift Detection
Learn More
System alerting
Learn More
Bar Chart
Distribution Monitoring
Learn More
Data quality
Data Quality
Coming Soon

Your Monitoring Flow

A descriptive workflow of machine learning monitoring flow

How it works

NannyML empowers you with the ability to estimate the performance of your deployed machine learning models.
It is completely model-agnostic and currently supports all tabular use cases, classification and regression. (NLP and CV work with a bit of hacking ;) )

Know the business impact of your models

  • Define a cost/benefit matrix
  • Set a custom threshold
  • Get alerted when expected business value drops
Estimate Business Impact
A graph showing estimated business value of a machine learning model
Video interaction of estimating post-deployment model performance

Focus on a single performance metric

Estimate model performance of classification models with CBPE

  • roc_auc, f1, precision, recall, specificity, accuracy or any of the confusion matrix metrics

Estimate model performance of regression models with DLE


Faster root cause analysis

Detect changes in your data as a whole through multivariate feature drift

PCA based Data Reconstruction

Detect changes in individual features

Detect changes in your target distribution and model output

  • Kolmogorov-Smirnov Test, Jensen-Shannon Distance, Wasserstein Distance, Hellinger Distance, Chi-squared Test, L-Infinity Distance
Leverage Covariate Shift Detection

You're in good company

Monthly Active Instances
of models monitored by NannyML
Data Scientists
across LinkedIn and GitHub
A review of NannyML reading - When the team from NannyML explained to me what they were working on, I had very strong doubts about whether estimating performance is even possible from a theoretical point of view. Yet, I was curious, I am in credit scoring and new ground-truth events are very hard to come by. So, this is a real problem for us. In the unlikely event that this would actually work, we set up an experiment with multiple holdout datasets (of which only we know the realized performance). Back then, they were still estimating accuracy but the results were mind-blowing. Very confident that they will crack this problem. Congrats on the launch!
A review of NannyML by Jakub Sliz reading - A very clear vision for the product. Can't wait to fully use it in measuring out NGO disinformation AI model's performance. Love the documentation and the very simple setup. Must have for all operating in an ecosystem where data is a rare asset.
A review of NannyML by Bart Vandekerckhove reading - Love the product. Saved us a lot of money from failed predictions
A review of NannyML by Johannes Hotter reading - NannyML is just amazing; super easy to use, giving me all the insights I need to sleep well knowing my models still work in production ;)