Simply put, if it isn’t visualized with real data, you do not understand it, and you cannot trust it.
I generate 1,000s of visualizations per ML product.
- your math will not save you
- neither will your test set
- you cannot rely on canned visualizations
- you cannot just “look” at the data
In reality, your math is an abstraction that ablates as soon as it encounters real data.
You must understand the model (viscerally) to improve it, your test set is not representative of reality, and your error function does not match your real objective.
Just because mdl.plot() renders an image or you open Tensorboard does not mean you have accomplished anything.
You need to bring other dimensions to bear, change the level of aggregation, or alter the presentation to find truth.
Examining the data is crucial in the beginning, but no amount of peering at predictions or weights will help you build a global understanding of what is happening.
- study the classics
- choose the right tool, and master it
- find inspiration
- iterate again, and again
- listen - but not too closely
- put them in production
My favorite classic to absorb is Tufte’s Display of Quantitative Information: edwardtufte.com/tufte/books_vd…
You use a *grammar* for producing visualizations, that once mastered will make you 10-100x more efficient and effective.
distill.pub is a fantastic resource, as are many of the kernels at kaggle.com/kernels.
For a given dataset, I may have 100 ideas. For a given idea, I may generate 50 drafts. And for every 10th final plot, I have learned something fundamentally new about visualizing complex data.
Remember that most people can tell you if a visualization is great, but very few can tell you how to improve a mediocre one.
Visualizations are a gift that keeps giving.