People often ask me how to build better intuitions about different machine learning and deep learning methods. This is a thread about my experience (as an NLP Researcher) building better intuitions of ML/deep learning methods, including resources and tips.
🧵
Overview -- Building intuitions about concepts related to a field requires investing a lot of time and effort. For ML, it is no different. In this thread, I will share a bit of my journey and personal experience building intuitions about DL/ML algorithms & new research ideas.
I don't claim that the tips I share here will work for everyone. Doing a Ph.D. gave me enough time to explore ways to dig deeper into topics, so the context matters. I had access to great advisors that provided me a learning path to be productive in learning and building things.
High-level overview -- Before jumping deep into ML, I took courses like data mining & text mining to build a high-level understanding of methods for building predictive systems. Having this background allowed me to spend time on the problems/methods that I found interesting.
Here are books that I used in my studies to get that **high-level overview**:
📘 Artificial Intelligence: A Modern Approach
📘 Data Mining: Concepts and Techniques
📘 Text Mining: Predictive Methods for Analyzing Unstructured Information
Hands-on experience -- Using that high-level knowledge, I built and trained a lot of models from scratch using tools like R and Python. In order to better understand these models, I attempted to adapt them to different problems, including working on different datasets/tasks.
These books helped with getting that initial hands-on experience:
📘 Data Science from Scratch
📘 Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
* I also dabbled with Kaggle during my studies.
Scope -- As I focused on a family of approaches that helped with the problems I was interested in (mostly NLP-related), I began to get interested in the inner works of these models. I used a combination of visualizations, coding, and math to help me build deeper intuition.
Math - Understanding the math behind ML models really helped me build enough intuitions to get comfortable experimenting with different ML models. I had a strong math background going into my graduate studies so I mostly had to refresh on statistics & advanced calculus...
These two books helped with improving my mathematical understanding of predictive models:
📘 Pattern Recognition and Machine Learning (by Christopher M. Bishop)
📘 The Elements of Statistical Learning (by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie)
Another book that more recently I have found to be an excellent resource for better understanding the mathematics behind machine learning is the following:
📘 Mathematics for Machine Learning ( by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong)
Visualization -- Understanding the mathematics behind machine learning models is a difficult process especially if you lack a background in math. The good news is that we have excellent tools to help us with this. Graphing and visualization tools really come in handy here.
In terms of visualization, the following skills help:
- Visualize/Understand different probability distributions
- Plotting 2D/3D charts (line chart, scatter plots, bar charts, heat maps, etc)
Visualization is a powerful tool to build intuition... get enough practice here.
Deep learning involves a lot of different types of data transformations. It's important to understand what these transformations do (e.g., dot product, softmax, ReLU, etc.) to get better intuitions about what these models are attempting to do. It's all about exploring here.
These days we have a variety of interactive tools that make it so much easier to plot charts, debug models, visualize weights and predictions, produce loss curves, etc. Check out a few interactive tools put together by @__MLT__: github.com/Machine-Learni…
Interpretability -- Even though this is not discussed as much, being able to understand/evaluate ML models features/predictions is not only a great way to bet better intuitions but also an important skill as you aim to push ML models into the real world for decision making.
There is a whole area of research around evaluating model explanations and interpreting machine learning models. Here is a book I found really useful to get more familiar with how to make black box models explainable:
📘 Interpretable Machine Learning (by Christoph Molnar)
Tools -- The tools you use really shouldn't matter but the more tools you know the better. Just use what works for you or your team. I use Plotly, Pandas, scikitlearn, TensorFlow, and PyTorch. I explore a lot of deep learning models, so I tend to use PyTorch more often.
Overall, I build intuitions around ML methods by:
- reading key literature/establishing background/understanding theory
- running code (if available) & additional experiments
- analyzing/visualizing loss curves, weights, etc.
- analyzing/interpreting/explaining predictions, etc.
If this thread is helpful, I will improve it and publish it as an article in my blog, which you can find here: elvissaravia.substack.com
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Before you jump into deep learning, I would strongly advise you to do a few introductory machine learning courses to get up to speed with fundamental concepts like clustering, regression, evaluation metrics, etc.
Here is a thread including a few recent courses you can explore:
Note: this is probably the place you want to start. Start slowly and work on some examples. Pay close attention to the notation and get comfortable with it.
📘 Pattern Recognition and Machine Learning
by Christopher Bishop
Note: Prior to the book above, this is the book that I used to recommend to get familiar with math-related concepts used in machine learning. A very solid book in my view and it's heavily referenced in academia.
It's really concerning to see so much false advertisement on this idea that applying machine learning is easy.
I talk from both a research and application perspective. The process is rigurous. It's highly iterative and that should give you a hint of why it's hard.
There is rarely a straightforward answer on how to properly apply ML algorithms to dynamic real world datasets. It's a lot of experimentation. First, you need to organize and understand your data very well.
A good experimentation framework helps but that's just the beginning.
A lot of the toy datasets and problems used to teach ML today are clean and binary.
Things in the real world are rarely binary.
You need to spend time cleaning and understanding your data and in some cases dealing with other aspects of it like access, control, privacy,...