For many problems, a batch size of 32 works so well.

A batch size mostly affects training time. The larger the batch size, the faster the training.

The smaller, the slower training.
The only issue with the large batch size is that it requires many steps per epoch to reach optimal performance.

And you need to have a large dataset in order to have enough steps per epoch.

With that said, 32 is a good default value to try at first.
Here are 2 great papers that you can use to learn more:

Practical Recommendations for Gradient-Based Training of Deep Architectures: arxiv.org/pdf/1206.5533.…
This paper suggests a different approach: Use a large batch size but start with a small learning rate and adjust accordingly...

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

arxiv.org/pdf/1705.08741…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jean de Nyandwi

Jean de Nyandwi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Jeande_d

4 Aug
One of the techniques that have accelerated machine learning on insufficient real-world datasets is data augmentation.

Data augmentation is the art of creating artificial data.

For example, you can take an image, flip it, change color, and now you have a new image. Image
Yes, data augmentation is the art of creating artificial data to expand a given small dataset.

It has shown that it works so well(most of the time), and it remarkably handles overfitting.
Nearly most types of data can be augmented, but I have noticed that it works well in unstructured data(images, video, sound).

So, this thread will focus more on images, sounds, and videos.

For more about structured vs unstructured data 👇

Read 18 tweets
3 Aug
Choosing the right machine learning algorithm for the problem can be hard.

This thread is about 3 questions that can help in the model selection process.

A thread🧵👇
1. Will the predictions have to be fully explainable?

Most models are not explainable. If you're looking for explainability, neural nets may not help. Models like decision trees can.

More interesting facts about decision trees beyond explainability👇

2. How big is the dataset?

For big datasets, neural nets and other complex models like ensemble methods can be a good choice.
Read 7 tweets
23 Jul
One of the neural network's architectures that have outpaced traditional algorithms for image recognition is Convolutional Neural Networks(CNN), a.k.a ConvNets.

Inspired by brain's visual cortex, CNN has become a decent plugin for many computer vision tasks.

More about CNN 👇
CNN is made of 3 main blocks which are:

◆Convolutional layer(s)
◆Pooling layer(s)
◆Fully connected layer(s)

Let's talk about each block.
1. Convolutional layers

The convolution layers are the backbone of the whole CNN. They are used to extract the features in the images using filters.
Read 23 tweets
21 Jul
In any typical application powered by machine learning,

whether that is a text classifier running in a web browser, or

a face detector running in a mobile phone, its machine learning code(or model) will be or close to 5% of the whole app.

Other 95%...👇 Image
Other 95% includes data, analysis, and software-related things.

A machine learning model being only 5% of the whole application often implies that we should be doing something else beyond tuning models.
Such as:

◆ Building irreproducible data preparation pipelines.
◆ Evaluating properly.
Read 4 tweets
19 Jul
PCA is an unsupervised learning algorithm that is used to reduce the dimension of large datasets.

For such reason, it's commonly known as a dimensional reduction algorithm.

PCA is one of these useful things that is not talked about. But there is a reason 👇
The PCA's ability to reduce the dimension of the dataset motivates other use cases.

Below are some:

◆ To visualize high dimensional datasets, particularly because visualizing such datasets is impractical.
◆ To select the most useful features while getting rid of useless information/redundant features.

But not always, sometimes useful information will be lost too especially if the original data was already good and didn't contain noises.
Read 26 tweets
23 May
Machine Learning has transformed many industries, from banking, healthcare, production, streaming, to autonomous vehicles.

Here are examples of how that is happening👇
🔸A bank or any credit card provider can detect fraud in real-time. Banks can also predict if a given customer requesting a loan will pay it back or not based on their financial history.

2/
🔸A Medical Analyst can diagnose a disease in a handful of minutes, or predict the likelihood or course of diseases or survival rate(Prognosis).
🔸An engineer in a given industry can detect failure or defect on the equipment

3/
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(