Santiago Profile picture
5 Feb, 11 tweets, 2 min read
The one million dollar question:

"Is it reasonable for someone to dive into machine learning with a shallow knowledge of math?"

▫️ The short answer is "yes."
▫️ The more nuanced answer is "it depends."

Let me try and unpack this question for you.

🧵👇 Image
You can think about machine learning as a spectrum that goes all the way from pure research to engineering.

The more you move towards a research position, the more you can benefit from your math knowledge. If you move in the other direction, you'll get away with less of it.

👇
I have friends that got a Ph.D. and became college professors.

For them, math is an absolute requirement!

Not only are they working on research projects, but they are teaching the next generation of scientists and engineers.

👇
Other friends went the other direction: they took positions at companies that focus on exploiting existing high-level frameworks to produce value.

A lot of the math required is already abstracted away by these libraries. They can get away with much less knowledge.

👇
Notice that I didn't say that you don't need math *at all*. I specifically framed the initial question as having a "shallow knowledge" of it.

It's tough to escape from math, regardless of what you do in life. Machine learning is no exception.

👇
I'm an engineer. This is my experience:

▫️ Basic knowledge of statistics and probabilities is essential for what I do.

▫️ Understanding derivatives much less so.

▫️ Linear algebra comes up all the time, but it's easy to refresh concepts when I need them.

👇
I want you to keep this in mind:

▫️ You can get started with a much shallower knowledge of math than what would be required for a job.

Part of the process is to learn what you need!

This is probably the most relevant piece of advice that always gets lost in translation.

👇
If today is your first day getting on the machine learning train, I promise you can get away with basic high school level math.

In a year, you'll need more than that. But that will happen gradually.

👇
Remember when you were learning to ride a bicycle?

If you look back and try to think logically about it, there are so many things you probably *wish* you knew before starting.

But you didn't.
And you learned.
And everything turned out okay.

👇
Let's wrap up this with a TLDR;

▫️ You can get started with very little math.
▫️ As you progress, you'll incorporate more of it.
▫️ Your specialization will determine how much.

More important than anything:

GET. STARTED. Don't overthink it.
I finish work every day around 5 pm. Then I come here to share some of the things I've learned.

Every day ☀️🌙

If you don't mind a blunt and practical point of view about machine learning and software engineering, stay tuned!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

7 Feb
Everything you need to know about the batch size when training a neural network.

(Because it really matters, and understanding it makes a huge difference.)

A thread. Image
Gradient Descent is an optimization algorithm to train neural networks.

The algorithm computes how much we need to adjust the model to get closer to the results we want on every iteration.

2/
We take samples from the training dataset, run them through the model, and determine how far away our results are from the ones we expect.

We call this "error," and using it, we compute how much we need to update the model weights to improve the results.

3/
Read 19 tweets
4 Feb
I built a model to predict whether you'll be involved in a crash next time you get in a car.

And it's 99% accurate!

Allow me to show you...👇
Here is the model:

👇
The National Safety Council reports that the odds of being in a car crash in the United States are 1 in 102.

That's a probability of 0.98% of being involved in a crash.

Therefore, my silly model is accurate 99% of the time!

See? I wasn't joking before.

👇
Read 21 tweets
2 Feb
For the past few months, I've been trying to improve the quality of the content I publish.

There are a couple of ways I'm measuring this:

▫️ Efficiency
▫️ Engagement

Efficiency is about how many impressions and followers I get for every tweet I post.

👇
I've gone from posting 3,126 tweets back in August down to 949 tweets last month.

I've cut a lot of the noise!

During the same period, I've doubled my impressions (up to 14.4M last month,) and I'm now converting 5.38 followers for every tweet (up from 2.52.)

👇
The second way I'm watching the quality of the content I'm posting is through the engagement rate.

This has gone down quite a bit since August (almost cut in half!)

As impressions increase, the more pressure I have to put engaging content out there.

👇
Read 7 tweets
2 Feb
Here is a full Python 🐍 implementation of a neural network from scratch in less than 20 lines of code!

It shows how it can learn 5 logic functions. (But it's powerful enough to learn much more.)

An excellent exercise in learning how feedforward and backpropagation work!
A quick rundown of the code:

▫️ X → input
▫️ layer → hidden layer
▫️ output → output layer
▫️ W1 → set of weights between X and layer
▫️ W2 → set of weights between layer and output
▫️ error → how far is our prediction after every epoch
I'm using a sigmoid as the activation function. You will recognize it through this formula:

sigmoid(x) = 1 / 1 + exp(-x)

It would have been nicer to extract it as a separate function, but then the code wouldn't be as compact 😉
Read 7 tweets
1 Feb
Time spent developing better datasets is usually more productive than squeezing the algorithms that process them.
One thing to keep in mind is that "better datasets" is not equivalent to "more data."

Regardless of your ability to collect the data, properly pre-processing it will usually give you a very good bang for your buck.

Hopefully, credit is given for the ultimate predictive ability of the solution.

A machine learning system is not just a model. There are a lot of pieces that need to work together.

Read 5 tweets
1 Feb
Here is a simple example of a machine learning model.

I put it together a long time ago, and it was very helpful! I sliced it apart a thousand times until things started to make sense.

It's TensorFlow and Keras.

If you are starting out, this may be a good puzzle to solve.
The goal of this model is to learn to multiply one-digit numbers.

The dataset has two values (the ones we want to multiply.) That's why the input shape is 2D.

The input shape represents the input layer of our model. It connects to the first hidden layer: a 4-unit Dense layer.

Then you get another 4-unit Dense layer.

Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!