I see lots of coders trying to get into #deeplearning (DL) without having math pre-reqs.
Good idea? Bad idea? It can be either ⚖️. Let's talk about that 🧵👇
0. What do we mean by "math skills"? I'm talking: calculus, linear algebra, probability, and statistics.
Let's get something else out of the way: people might accuse you of trying to take a shortcut. But it's perfectly normal (and arguably optimal) to try to see how far you can get with what you already have! The impulse isnt wrong, but it may still be the wrong choice for you.
Without solid math skills, you WILL be limited with how far you can go in DL. This isn't necessarily a problem. People learn how to drive cars all the time without ever intending to build a car of their own. Sometimes driving is enough.
If all you want to do is drive DL models (a.k.a. deploy pretrained models on your data), then the math barely matters. You'll need to know how and if it's okay to input your data to the model and how to how interpret metrics and outputs. That can be learned on-the-fly 🦟✅
If you want to:
- 🏗️build a new model
- 🚂train existing models
- 🔎analyze models
- 🧑🎨 visualize models
then the math MATTERS. Not just a little bit, A LOT.
The math matters because failures for DL are
inherently ambiguous. If your #neuralnetworks aren't learning during training, it can be for any of the following reasons:
A) Your model and training code are right, but you need to find better hyperparameters.
B) You are doing something mathematically (and provably) impossible, nonsensical or incorrect. This could be using an inappropriate loss function or setting hyperparameters that make your task impossible or numerically unstable.
Without math, you'll NEVER catch B-type failures. You'll just meander into them without really knowing your task is Sisyphean.
Even if you find yourself with A-type issues, the math gives you clues on how to choose your next set of hyperparams.
The worst part is that in most cases, training takes a LONG time (and can be expensive, too). So to succeed with DL, you want to minimize AHEAD OF TIME your chances to bump into A-type and B-type failures. Knowing the math behind your models goes a long way towards that.
Ultimately, its up to you to decide how your learning journey will go. But I hope this thread makes the trade-offs more explicit so you know what you're choosing.
If you're unsure and just wanna just dip your toes, you can watch me build/train models at twitch.tv/encode_this
If anyone has any other takes on this or personal experiences with this choice, let's hear it. More sharing = more XP = more levelling ⬆️ for everyone.
• • •
Missing some Tweet in this thread? You can try to
force a refresh