Prashant Profile picture
ml engineer @nagarro
𝔽_un Profile picture Sujoy De Profile picture 2 subscribed
Feb 24, 2023 12 tweets 2 min read
How ChatGPT Works?

• Everyone knows about ChatGPT but not everyone is aware of how it works.

Here is an attempt at explanation ↓

[ In 10 Steps ] • It is a large language model based which uses a technique called "transformer" to understand and generate human-like responses to text-based input.

• Transformer is a neural network architecture that excels at processing sequential data, such as text.
Sep 22, 2022 23 tweets 5 min read
"Learn SQL"

Great advice no doubt.

• But what topics to cover?
• Which SQL database to use?
• What resources to learn from?

Here's is a track you can follow ↓

1/22 Let's start with choosing the SQL database to learn.

• There are several of databases like Postgres, MS SQL server, MS Access, Oracle.

• But for learning purposes I'd suggest going with MySQL.

• For reasons that it is secure, free & open source and the support is great.
Sep 14, 2022 11 tweets 3 min read
Interview Question:

• What is Covariance?
• What is Correlation?

• What are the differences between them?

Explain briefly ↓

0/9 COVARIANCE

• Covariance tells us the systematic relationship between two random variables, in which a change in one reflects the change in other.

• It measures the joint variability of two random variables.

• The formula for covariance is:

1/9
Sep 12, 2022 11 tweets 2 min read
Interview Questions

• How does k-means work?
• What are its stopping criteria?
• What are its pros and cons?
• How do you choose its number of clusters?

Explain briefly ↓

0/4 Image 1. Working

• k-means is an unsupervised algorithm.

• We want to create groups of similar data points using this algorithm.
Sep 2, 2022 17 tweets 2 min read
Another common interview question

• What are the assumptions of Linear Regression?
• How do we check them?
• How can we fix them?

Here's the answer ↓

0/5 1. Linear Relationship

It is assumed that the relationship between the dependent and independent variables is linear.
Aug 31, 2022 15 tweets 3 min read
Random Forests is a favorite for interviews!

By far the most common questions that I have been asked are one way or other related to Random Forests

It's important to know it inside out.

Here's are some of those questions:

0/8
Q: What ensemble principle is used in Random Forests?

A: Random Forest works on the principle of the bagging ensemble technique.

Bagging stands for Bootstrap Aggregation.
In Bagging, random data samples in a training set are used with replacement.

1/9
May 25, 2022 14 tweets 4 min read
Machine Learning | Mathematical Resources

Here are some of the resources that I went through for understanding mathematics relevant to machine learning understanding ↓ 1. Linear Algebra

• 3Blue1Brown Essence of Linear Algebra (Videos)
youtube.com/playlist?list=…
Apr 7, 2022 11 tweets 3 min read
What is Apache Spark? ⭐️

• You must have heard about Apache Spark. But what is it? What are its characteristics?

• Apache Spark is a widely used tool in the big data industry all over the world so let's discuss more about it.

• The scale at which the data in future like for internet needed to be stored and processed could not have been handled by traditional storages.

• Several tools such has Google File System, MapReduce were created to tackle such problems.

• But they were not very efficient
Apr 6, 2022 10 tweets 3 min read
Plotting Subplots in Matplotlib 📈📉

• One of the handiest ways to plot multiple charts in matplotlib I find is using the subplot() method.

• It can be used with matplotlib as well as seaborn plots just as well.

Let's see how to use it ↓ • First, calculate the total plots you want.

• Then we decided what grid we want, the numbers have to be multiple of the total plots.

• Like for 4 plots we can have 4x1, 2x2, 1x4 grids.
Apr 5, 2022 12 tweets 4 min read
Learning Machine Learning

• If you're not aware, you don't need to ask for resources and roadmaps anymore.

• Tensorflow has officially curated a list of resources and courses for you to master machine learning.

Here's what they have ↓ They have divided the learning into 4 key areas

1. Coding Skills
2. Math and Stats
3. ML Theory
4. Building projects
Apr 4, 2022 10 tweets 3 min read
K-Fold Cross-Validation ✅

If you prefer threads over short videos, this is for you ↓

• Cross-validation is very common in Machine Learning to measure model performance.
• K-Fold is one of the commonly used methods for cross-validation.

Here's how it works ↓ • It's a good practice to keep a test set separate first so that we have a set that is unseen at all times for final evaluation.

• Then we would proceed with the training set only.
Feb 11, 2022 9 tweets 2 min read
⠿ Namespaces in Python ⠿

• What are Namespaces?
• Why are they needed?
• How does python implement them?

• Namespace is a system that maintains the integrity of the variable names in a programming language.

• It is a collection of all the names and the details of the objects they are associated with.

• Namespaces are a way to implement scoping and variable scoping.
Jan 28, 2022 5 tweets 2 min read
⠿ Python Tip ⠿

Accessing dictionary values
What's the difference? What to use?

Image • There are two ways of accessing values using the keys.

1. You can use the get method that takes in the key as the argument and returns the value.

2. Or you can just pass the key in the square brackets and you should get the value of the corresponding key.

Which is faster? Image
Jan 22, 2022 13 tweets 5 min read
Getting Started with Deep Learning 🧠

• If you're already comfortable in Machine Learning and want to start with deep learning here's a rough outline.

• The resources mentioned need no testimony and I refer them myself.

• Start with Andrew Ng's neural networks lectures. These are available in Andrew Ng's Machine learning playlist.

• The course is available on Coursera and also can be found on YouTube.
coursera.org/learn/machine-…
Jan 17, 2022 18 tweets 5 min read
How to prepare for Tensorflow Exam 🎓

A lot of people asked how to learn TensorFlow and prepare for the certification exam,

Here is how you can do it ↓ • Tensorflow is a free and open-source framework by Google that allows creating and deploying machine learning/deep learning models.

• Tensorflow certification is an official exam that you can take and have a trophy, although it's not necessary for learning.
Jan 4, 2022 10 tweets 2 min read
What format to save your model?

SavedModel or H5?

What are those? What's the difference?

Let's see ↓ • If you have been working on machine learning or deep learning with tensorflow, you must have saved your models.

• Often we see them saved as ↓
Jan 3, 2022 5 tweets 2 min read
Pooling from Scratch

• We saw in the last thread a simple implementation of convolution operation on an image.

• Let's check out a tiny implementation of pooling too ↓ • First, we would read an image and plot it. Notice the original size of the image.

• We will do a (2x2) pool with a stride of 2.

• So if we do the calculation, the image size should get halved.
Jan 3, 2022 6 tweets 2 min read
Convolution from Scratch!

Have you ever implemented a convolution operation on an image from scratch?

Let's do it on a sample image ↓ Image • First we should open an image and check it out in original format.

• We can get its shape that'll help us to slide the filter over it.

• Now we have to define the filter to be passed. Image
Dec 8, 2021 13 tweets 4 min read
Randomness and Seeds

• Reproducibility of results is something that we often desire in machine learning

• Due to the random nature of weights getting the same results is quite difficult.

How do we tackle these? ↓ • Oftentimes in machine learning we have operations with randomness as a component.

• On each run of the program or application the results are different, which in turn becomes a problem if we want to get the same results later on.
Oct 20, 2021 14 tweets 4 min read
MAPE is another metric used in performance evaluation in machine learning.

The formula looks a tad bit complex but it isn't.

Let's try to break it down. ↓ • To start with, Mean Absolute Error is a metric which shows how far a value is to the target value.
Oct 7, 2021 5 tweets 2 min read
Ensemble in Machine Learning

Ensemble methods include fitting multiple models with varying features and aggregating their results for the final prediction.

Three basic ensemble techniques are:
- Bagging
- Boosting
- Stacking

Overview ↓ BAGGING

The is a parallel process:

• Choose a sample from the training data
• Train a model
• Save the model - repeat for n times
• Take the output of each of the models and aggregate (say using average).