Tweet

Tim

14 Jun, 23 tweets, 8 min read

1/ You're a data scientist and built an anomaly detection model. How to visualize the results?

In this thread we visualize the results of a variational autoencoder (VAE) and see how well it can detect tool failures in manufacturing.

#DataScience + #Manufacturing = 💰💰

🧵👇

https://twitter.com/timothyvh/status/1402407359150911488

2/ This is continuation of a series of threads (first thread quoted below) where I went over the business case for using data science and machine learning in industrial environments.

https://twitter.com/timothyvh/status/1402407359150911488

3/ I then trained 1000 VAEs to detect tool wear in metal machining (random search).

The best model was selected based on the precision-recall area-under-curve (PR-AUC). The picture below explains this process.

4/ Why use precision-recall instead of ROC? Simple, it's much better on imbalanced data. And the milling data sets we used for training the VAEs is highly imbalanced.

See this journal article: journals.plos.org/plosone/articl…

5/ FYI: The best performing model looks something like this, except it has 2 layers instead of 3, and a coding size of 21 instead of 18.

6/ We can feed all the data back into the best VAE model and see how well it identifies anomalies (aka failed/worn tools). Here is a table of results across all data splits:

7/ Notice something? The anomaly detection performed in the latent space (using KL-divergence) gives the best results.

8/ Here is the precision-recall and ROC curves.

You can see how the area-under-curve is less for the PR plot. This is expected, and again shows why you should use PR instead of ROC for imbalanced data.

@matplotlib

9/ A violin plot is a nice way to visualize the performance of the best model. Here's the plot, generated with @matplotlib.

You can generate this plot in Google Colab here: colab.research.google.com/github/tvhahn/…

10/ There are some data point that the model has trouble classifying as normal or abnormal. That's anomaly detection for you! It's challenging to separate the noise from the anomalies, especially in real-world messy data.

11/ We can also see how the model performs on different cutting parameters. How does the model perform on steel vs. cast iron? Fast or slow feed rate? Etc.

Here are the results:

12/ The model prefers some parameters over others. This would be interesting to explore further.

I suspect that during training, certain models develop preferences for some parameters over others, based on the selected hyperparameters.

13/ You might be able to make an effective "ensemble of models" based on a combination of models that are tuned for different cutting parameters.

Would be fun to do in some future work (or you can do it!).

@matplotlib

14/ Another great way to visualize the results is to plot the latent space KL-divergence scores, sequentially, over time.

Look at this chart. Isn't it beautiful? All made in @matplotlib ❤️

(and I know, I'm a nerd, lol) #DataVisualization

15/ But of course, even the "best" model has problems detecting some of the "failed" or anomalous tools. Look at this trend. See the rapid spike downward near the end?

16/ There is so much more that can be explored (and I'm sure I've made some mistakes along the way too).

All the code is available on GitHub.

github.com/tvhahn/ml-tool…

17/ See more details in my blog post:

towardsdatascience.com/anomaly-detect…

https://twitter.com/timothyvh/status/1402408465687924736

18/ In this thread series we've explored the use of anomaly detection in metal machining.

This is but one use case for #DataScience and ML in #Manufacturing.

The business case for using DS and ML in manufacturing is STRONG.

https://twitter.com/timothyvh/status/1402408465687924736

19/ The one thing I want to emphasize is this:

If you are in manufacturing, at minimum, you should be exploring the tools of ML and DS and building up competency in your organization.

Play in the sandbox.

https://twitter.com/matvelloso/status/1065778379612282885

20/ There will be many sales-people pitching their ideas. Lots of noise.

https://twitter.com/matvelloso/status/1065778379612282885

21/ Build organic competency in your org. -- nurture individuals who can speak the language of DS and ML and have domain knowledge in manufacturing.

They will be essential in guiding you. And they can help you build new tools.

It's never been easier to build.

22/ Software is eating the world.

Traditional domains of manufacturing, condition monitoring, are being enhanced by new tools (ML, data science). As such, more individuals in your organization need to be cross-disciplinary.

a16z.com/2011/08/20/why…

23/ That's all I got! If you made it to the end, congrats. If you want to talk further, my DMs are always open.

If you're an academic researcher, please cite my work. 😀

Preprint here: researchgate.net/publication/35…

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Tim

Try unrolling a thread yourself!

Did Thread Reader help you today?

Like this author's thread?