Machine learning is the science of teaching the computer to do certain tasks, where instead of hardcoding it, we give it the data that contains what we want to achieve, and its job is to learn from such data to find the patterns that map what we want to achieve and provided data.
These patterns or (learned) rules can be used to make predictions on unseen data.
A machine learning model is nothing other than a mathematical function whose coefficient and intercept hold the best (or learned) values representing the provided data & what we want to achieve.

In ML terms, coefficients are weights, intercepts are biases.
Take an example for a simple straight line equation:
y = aX + b

y is the label
X is the input data
a is weight/coefficient
b is bias/intercept
So basically,

If you have a set of values X and y, and you are asked to use those values to find `a` and `b`,

you would crunch numbers until you find them as long as you know you are dealing with a line equation.
Let's say that the below table is X and y.

Would you find the relationship between X and y?

If you can be able to find such a relationship, you can get the corresponding value of 9 easily. Image
Say you are able to find that y = 2X + 1 from the above table. You can use this equation to find any value of y given X.

Finding the value of 9 would be the very straight thing.

y = 9*2 + 1 = 19
That is fairly easy to calculate and you would not need machine learning.

But let's forget that for a minute, and use machine learning to find a model that we can use to predict y given 9. Image
About tooling, for problem like that, we can use Scikit-Learn, a classical ML framework that is well designed.

We can also use NumPy to represent our data as an array of values.

And Matplotlib for plotting the line. It's always good to plot the data that you are working with. Image
The thing called Linear regression above (imported from sklearn) is a type of machine learning model that is used for linear datasets (ideally),

but primarily, it is used when we want to predict a single continuous value.
A linear model is the best fit for our data because our data seems to be linear.

A meaning of linearity: Change in the input data(X) is directly proportional to change in output (y).

P.S there are more complex models suited for non-linear datasets like support vector machines.
So, now it's time to build our model and train or teach it on our input data X and output label y.

The whole goal of training a machine learning model is to find the best values of weight (a) and bias (b).

It's simple as we are not reinventing the wheel. Image
The nice thing about having good data (technically, good data has a predictive power) and a relevant model, is that you do not need to tweak a lot of things.

In addition to that, Linear Regression doesn't have things to tweaks (hyperparameters)
To remove the confusion of a newly introduced term called hyperparameters, we got to understand parameters first.

Parameters are model values (weight and biases), and hyperparameters are set of values that engineer has to set or change to get good results.
We can change the hyperparameters, but we do not change parameters. As we train, the model learns the best values of parameters (weight & bias).

More about hyperparameters and parameters. By @svpino and @AlejandroPiad replies

So, now having a model, the next thing would be to predict the value of y given 9. That's the initial question.

Perfect, you see it's 100% correct. y(9.0) = 19.0 Image
If you wanted to access the model parameters, or weight and bias, you can get them too.

And remember, in equation y = aX + b, a is the weight or coefficient of the equation, and b is the bias or intercept.

Looking at the below image, these values of weight and bias are ☑☑ Image
This was a simple thing to do, but I find it good to start simple when explaining a hard topic.

Plus it can motivate you to do more. I was personally motivated by these simple things in my early days of doing machine learning.
This is the end of the thread.

Initially, I wasn't planning a thread but then thought that extending it would perhaps be helpful to anyone who still wonders what machine learning really is.
Thank you for reading.

You can follow @Jeande_d for more simple posts like this.

P.S. One of my 2021's goals was to do something tangible for the ML community. I found that it would be cool to make well-structured content/curriculum.

It's close and I will share detail soon

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jean de Nyandwi

Jean de Nyandwi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Jeande_d

11 Sep
Popular deep learning architectures:

◆ Densely connected neural networks
◆ Convolutional neural networks
◆ Recurrent neural networks
◆ Transformers

Let's talk about these architectures and their suites of datasets in-depth 🧵
Machine learning is an experimentation science. An algorithm that was invented to process images can turn out to work well on texts too.

The next tweets are about the main neural network architectures and their suites of datasets.
1. Densely connected neural networks

Densely connected networks are made of stacks of layers that go from the input to the output.

Generally, networks are organized into layers. Each carry takes input data, processes it, and gives the output to the next layer.
Read 34 tweets
10 Sep
Neural networks are hard to train. The more they go deeper, the more they are likely to suffer from unstable gradients.

Gradients can either explode or vanish, and either of those can cause the network to give poor results.

A short thread on the neuralnets training issues
The vanishing gradients problem results in the network taking too long to train(learning will be very slow), and the exploding gradients cause the gradients to be very large.
Although those problems are nearly inevitable, the choice of activation function can reduce their effects.

Using ReLU activation in the first layers can help avoid vanishing gradients.

Careful weight initialization can also help, but ReLU is by far the good fix.
Read 4 tweets
31 Aug
Everyone talks about big data but getting good data in a big amount is not always easy.

You can do much with small data as long as it is good.

A thread 🧵: Getting the most results with small data
The two notable techniques that can give huge results when working with small data are

◆ Data augmentation
◆ Transfer learning

Let's talk about them. We will use them in the context of images kind datasets but they can also be applied to other datasets such as texts.
1. Data augmentation

Data augmentation is the art of creating artificial (but realistic) data.

Not only does data augmentation expand the dataset,

but it also introduces some diversity in the training set (the reason why data augmentation is a cure for overfitting)
Read 12 tweets
27 Aug
Getting started with machine learning can be hard.

We are fortunate to have many & freely available learning resources, but most of them won't help because they skip the fundamentals or start with moonshots.

This is a thread on learning machine learning & structured resources.
1. Get excited first

The first step to learning a hard topic is to get excited.

Machine learning is a demanding field and it will take time to start understanding concepts & connecting things.
If you find it hard to understand what ML really is,

@lmoroney I/O 19 talk will get you excited. He introduces what machine learning really is from a programming perspective.



This talk never gets old to me.
Read 29 tweets
5 Aug
For many problems, a batch size of 32 works so well.

A batch size mostly affects training time. The larger the batch size, the faster the training.

The smaller, the slower training.
The only issue with the large batch size is that it requires many steps per epoch to reach optimal performance.

And you need to have a large dataset in order to have enough steps per epoch.

With that said, 32 is a good default value to try at first.
Here are 2 great papers that you can use to learn more:

Practical Recommendations for Gradient-Based Training of Deep Architectures: arxiv.org/pdf/1206.5533.…
Read 4 tweets
4 Aug
One of the techniques that have accelerated machine learning on insufficient real-world datasets is data augmentation.

Data augmentation is the art of creating artificial data.

For example, you can take an image, flip it, change color, and now you have a new image. Image
Yes, data augmentation is the art of creating artificial data to expand a given small dataset.

It has shown that it works so well(most of the time), and it remarkably handles overfitting.
Nearly most types of data can be augmented, but I have noticed that it works well in unstructured data(images, video, sound).

So, this thread will focus more on images, sounds, and videos.

For more about structured vs unstructured data 👇

Read 18 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(