The machine learning research community is very and very vibrant.

Here is what I mean...🧵🧵
In 1958, Frank Rosenblatt invented a perceptron, a very simple algorithm that would later turn out to be the core and origin of to days intelligent machines.
In essence, the perceptron is a simple binary classifier that can determine whether or not a given input belongs to a specific class.

Here is the algorithm of perceptron:
Rosenblatt's intent was not to develop a perceptron as an algorithm or software, but rather a machine.
The perceptron was implemented in hardware that got the name of Mark I Perceptron.

The weights were encoded in potentiometers, and weight updates were done by electric motors.

Here is Mark I Perceptron.
A perceptron or a simple artificial neural network architecture led to multilayer perceptrons(MLP) and later inspired other types of neural networks such as convolutional neural networks and recurrent neural networks.

Let's shift a little bit to convolutional neural networks.
Convolutional neural networks are the type of neural nets that have succeeded in image recognition tasks.

They are also used in text classification and time series analysis, but their major applications lie in visual recognition tasks.
One of the first Convnets architecture is LeNet-5 which was introduced by @ylecun.

LeNet-5 was designed for handwritten and machine-printed character recognition.
Here is a 1998 paper that used LeNet-5 in text recognition. The first paper that used LeNet-5 was in 1989.

vision.stanford.edu/cs598_spring07….
LeNet-5 inspired other powerful Convnets architectures such as AlexNet that won the Imagenet challenge of 2012.

In the years that followed, nearly all architectures that won the Imagenet challenge used Convnets.

Here is AlexNet paper

papers.nips.cc/paper/2012/fil…
Most of the architectures that won the preceding challenges were a modification of previous architecture.

Example: ZFNet which was a modification of AlexNet won the Imagenet challenge in 2013.

arxiv.org/abs/1311.2901
Other Convnets architectures that followed include GoogleLeNet by @ChrSzegedy(won imagenet 2014 and inspired Inception v-3 and v-4), VGG or Visual Geometry Group (that won Imagenet runner-up 2014), and ResNet that won the 2015 challenge.
There are other Convnets architectures that followed but most of them were modifications of previous architectures.

For example, Xception or Xtreme Inception by @fchollet was a version of GoogLeNet that used separable convolution layers instead of inception modules.
Now turning to recurrent networks.

Recurrent neural networks were created to handle sequential data such as texts, audio, and time series.
The first simple RNN cells failed to handle long-term sequences due to the problem of vanishing gradients, and Long Short Term Memories(LSTMs) were introduced to overcome that.

Gated Recurrent Units(GRUs) which is a simple version of LTSMs were later introduced too.
More about vanishing gradients problems

LSTMs are GRUs are all recurrent networks and are widely used in sequential tasks today.

BUT later, guess what? You didn't need recurrence or convolutions. You only need attention.

That was the slogan of Transformers. Attention is All You Need!

That was in 2017.
Transformers are solely based on attention mechanisms and they don't use any recurrence or convolutions.

arxiv.org/abs/1706.03762
They are now used in things like machine translations and other language tasks.

Real example: @lmoroney used transformers for his exciting project

Transformers have also been used in visual tasks.

See this.

arxiv.org/abs/2105.07581
It also turns out that you don't need attention or convolutions.

MLP-Mixer is one of the papers that claimed that none of Convnets or vision transformers is needed.

Multilayer perceptrons are enough...That was in May this year.

arxiv.org/abs/2105.01601…
And in the same month, another similar research claimed the same thing - YOU DON'T NEED ATTENTION. PAY ATTENTION TO MLPs.

PERCEPTRONS ARE ALL YOU NEED.

arxiv.org/abs/2105.08050
And this week, guess what? You don't need anything other than patches.

This was shocking news to the whole ML community.

See this from @karpathy
And here is ResNet again...

See this from @tunguz

In the beginning, it was perceptrons. And to day, ML research is still revolving around perceptrons.

It's good to be alive watching all of this incredible news!
Thanks for reading.

If you found any of this thing helpful, follow @jeande_d and share this thread with your friends.
For an in-depth understanding of neural networks architectures,

Here is a whole thread about them 👇

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jean de Nyandwi

Jean de Nyandwi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Jeande_d

7 Oct
The most useful courses are free. They are only challenging and hard to complete, which is why they are useful.

Here are 4 examples of the free machine learning courses that with enough dedication can help you get useful skills.

🧵
1. Machine Learning by Andrew Ng. on Coursera

Price: Free
Students: Over 4 million people

coursera.org/learn/machine-…
2. Full Stack Deep Learning by UC Berkeley

Price: Free

fullstackdeeplearning.com/spring2021/
Read 7 tweets
4 Oct
How to think about precision and recall:

Precision: What is the percentage of positive predictions that are actually positive?

Recall: What is the percentage of actual positives that were predicted correctly?
The fewer false positives, the higher the precision. Vice-versa.

The fewer false negatives, the higher the recall. Vice-versa. Image
How do you increase precision? Reduce false positives.

It can depend on the problem, but generally, that might mean fixing the labels of those negative samples(being predicted as positives) or adding more of them in the training data.
Read 10 tweets
26 Sep
Releasing a complete machine learning package containing over 30 end to end notebooks for:

â—†Data analysis
â—†Data visualization
â—†Data cleaning
â—†Classical ML
â—†Computer vision
â—†Natural language processing

Everything is now accessible here:

github.com/Nyandwi/machin…
Every single notebook is very interactive.

It starts with a high-level overview of the model/technique being covered and then continues with the implementation.

And wherever possible, there are visuals to support the concepts.
Here is an outline of what you will find there:

PART 1 - Intro to Programming and Working with Data

â—†Intro to Python for Machine Learning
â—†Data Computation With NumPy
â—†Data Manipulation with Pandas
â—†Data Visualization
â—†Real EDA and Data Preparation
Read 10 tweets
25 Sep
Here are 7 samples from what's coming tomorrow:

1. Data visualization with Seaborn

â—†Relational Plots
â—†Distribution Plots
â—†Categorical Plots
â—†Regression Plots
â—†Multiplots
â—†Matrix Plots: Heat and Cluster Maps
â—†Style and Color

colab.research.google.com/drive/1Qkf53B4…
2. Exploratory Data Analysis

â—†A quick look into the dataset
â—†Summary statistics
â—†Finding the basic information about the dataset
â—†Checking missing data
â—†Checking feature correlations

colab.research.google.com/drive/1iMpQOWH…
3. A Friendly Intro to Machine Learning

â—†Intro to ML Paradigm
â—†Machine Learning Workflow
â—†Evaluation Metrics
â—†Handling Underfitting and Overfitting

colab.research.google.com/drive/14uySoOh…
Read 9 tweets
20 Sep
TensorFlow or PyTorch?

Forget about numbers. They are both great at what they do, which is putting machine learning codes together.
TensorFlow is most popular in industries, and PyTorch in research organizations/academics,

but the number of industries that use PyTorch and the number of researches made with TensorFlow have been all increasing.
If you are choosing what to learn for the first time, what is the best than the other does not really matter that much.

Focus on one, know its ins and outs, avoid going back and forth learning all of them, and let everybody else use their favorite tools.
Read 5 tweets
20 Sep
The key differences between shallow learning and deep learning models:

Shallow learning models:

â—† Most of them are simple and require less hyper-parametrization
â—† They need the features to be pre-extracted
â—† They are best suited for tabular datasets
â—† Their architectural changes are very limited.
â—† They don't require huge computation resources
â—† Their results are interpretable than deep learning models
â—† Because of the limit in their design change, there are little researches going on in these models.
Example of shallow learning models:

â—†Linear and logistic regression
â—†Support vector machines
â—†Decision trees
â—†Random forests
â—†K-Nearest neighbors
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(