Read on Twitter

12,399 views

Jeremy Stanley

@jeremystan

, 13 tweets, 3 min read Read on Twitter

1/ The most effective ML engineers I know master visualizing complex data.

Simply put, if it isn’t visualized with real data, you do not understand it, and you cannot trust it.

I generate 1,000s of visualizations per ML product.

2/ First, let’s dispel some myths.

- your math will not save you
- neither will your test set
- you cannot rely on canned visualizations
- you cannot just “look” at the data

3/ Math is necessary, but not sufficient.

In reality, your math is an abstraction that ablates as soon as it encounters real data.

4/ Test set error is your North Star, but you’ll starve, drown or otherwise perish if you follow it blindly.

You must understand the model (viscerally) to improve it, your test set is not representative of reality, and your error function does not match your real objective.

5/ Garbage plots in, garbage understanding out.

Just because mdl.plot() renders an image or you open Tensorboard does not mean you have accomplished anything.

You need to bring other dimensions to bear, change the level of aggregation, or alter the presentation to find truth.

6/ It’s the forest, not the trees.

Examining the data is crucial in the beginning, but no amount of peering at predictions or weights will help you build a global understanding of what is happening.

7/ How do you build great ML visualizations?

- study the classics
- choose the right tool, and master it
- find inspiration
- iterate again, and again
- listen - but not too closely
- put them in production

8/ Visualizing complex data is a combination of technical skill and aesthetics.

My favorite classic to absorb is Tufte’s Display of Quantitative Information: edwardtufte.com/tufte/books_vd…

9/ To be creative you need a flexible, efficient and aesthetically pleasing tool. I love altair-viz.github.io in Python and ggplot2.tidyverse.org in R.

You use a *grammar* for producing visualizations, that once mastered will make you 10-100x more efficient and effective.

10/ Like studying the masters of art, your visualization skills will improve by examining other great ML visualizations.

distill.pub is a fantastic resource, as are many of the kernels at kaggle.com/kernels.

11/ Producing great visualizations is a never-ending journey.

For a given dataset, I may have 100 ideas. For a given idea, I may generate 50 drafts. And for every 10th final plot, I have learned something fundamentally new about visualizing complex data.

12/ Realize when you are too close to a plot, and share your results with your colleagues, manager, family, friends and customers (in that order).

Remember that most people can tell you if a visualization is great, but very few can tell you how to improve a mediocre one.

13/ Finally, make the effort to put your best visualizations into production. Scan the dashboards every time the model is trained, a new data source is added, or a new customer comes online.

Visualizations are a gift that keeps giving.

Like this thread? Get email updates or save it to PDF!

Subscribe to Jeremy Stanley

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Jeremy Stanley

This content may be removed anytime!

Try unrolling a thread yourself!

More from @jeremystan see all

Related threads

Trending hashtags

Did Thread Reader help you today?