Latest Twitter Threads by @Al_Grigor on Thread Reader App

Oct 2, 2022 • 5 tweets • 2 min read

AUC stands for "Area Under the Curve"

Usually, when we say "AUC" we mean "AUC ROC" - the area under the ROC curve

It's a way of evaluating the quality of a binary classification model based on a ROC curve

Let's see what it is 🧵

The ROC curve of the ideal model goes to FPR=0% and TPR=100%

The ROC curve of a random model is a straight line between (0, 0) and (1, 1)

Usually, the ROC curve of our model is somewhere between

Oct 2, 2022 • 12 tweets • 4 min read

ROC curves were used during WW2 to assess how well radars detect planes

Target:

🔸y = 1: there's a plane
🔸y = 0: there's no plane

ROC tells us how well a model can separate these two cases

It's based on two quantities: FPR and TRP

Let's use them to build a ROC curve 🧵

We'll use a more modern example: Churn prediction

🔸y = 1: customer stops being our client
🔸y = 0: customer continues being our client

Oct 2, 2022 • 7 tweets • 2 min read

Accuracy can be misleading

What to use instead?

👉 Precision
Among examples predicted as positive, how many are correct?

👉 Recall
How many positive examples are identified correctly?

Confused? Let me explain it with an example 🧵

Let's use the churn prediction example

🔸 We want to identify customers who will churn
🔸 We train a model for that
🔸 If the model thinks a customer will churn, we offer them a discount to keep them

Sep 30, 2022 • 7 tweets • 2 min read

Confusion table is confusing

In churn prediction:

🔸 True positive: correctly predicted churn
🔸 False positive: predict churn but they didn't churn
🔸 False negative: predict no churn but they churned
🔸 True negative: correctly predicted no churn

Let's see how to use it

🧵

To understand it better, it helps to think how the model is applied

In case of churn, we'll offer discounts to people who model marked as churning, hoping it'll help retain the users

Sep 10, 2022 • 4 tweets • 2 min read

200+ Data Science interview questions

🔸 Supervised machine learning (linear models, trees, neural nets)
🔸 Feature selection, parameter tuning
🔸 Unsupervised learning (clustering, dim reduction)
🔸 Recommenders and search
🔸 SQL
🔸 Coding (Python), algorithms

With answers 👇 First, 160+ theoretical interview questions:

github.com/alexeygrigorev…

(Note that 24 questions still have no answers - contributions are welcome)

Sep 9, 2022 • 8 tweets • 2 min read

Week 1 of Machine Learning Zoomcamp:

🔸 What's ML
🔸 Supervised Machine Learning
🔸 Process for ML projects
🔸 Linear algebra refresher
🔸 Numpy and Pandas

Here's a thread with tweet summaries this week What is machine learning?

https://twitter.com/Al_Grigor/status/1567143375765180416

Sep 8, 2022 • 9 tweets • 3 min read

Linear algebra's most important operations:

1️⃣ Vector-vector multiplication
2️⃣ Matrix-vector multiplication
3️⃣ Matrix-matrix multiplication

The best way to understand them is to express 2️⃣ with 1️⃣ and 3️⃣ with 2️⃣

Let me show you how 🧵 First, let's start with vector-vector multiplication (aka dot-product)

We have two vectors u and v

Multiply each element of both vectors with each other and then sum up the result

Aug 24, 2022 • 9 tweets • 2 min read

My onsite interview for ML engineering with a FAANG company:

🔸 Behavioral
🔸 Coding round 1 (two problems)
🔸 Coding round 2 (two problems)
🔸 System design
🔸 ML case study

Here are the questions I got👇 Behavioral:

Feb 25, 2021 • 14 tweets • 3 min read

Career Transitioning into Data Science

Talk by @pandeyparul packed with actionable advice

1️⃣ Love data
2️⃣ Create your own learning plan
3️⃣ Learn by doing
4️⃣ Contribute to open-source
5️⃣ Communicate insights
6️⃣ Network

🔗

Detailed summary 🧵👇 1️⃣ Learn to love data

🔸 Ask yourself: "why data science?"
🔸 Numbers should excite you
🔸 If you don't like seeing a lot of numbers, ask yourself if data science is right for you

Feb 13, 2021 • 7 tweets • 1 min read

For any project, follow these steps

1️⃣ Make it work
2️⃣ Make it right
3️⃣ Make it fast

In this exact order

It's important. Let me explain why

Thread 👇 1️⃣ Make it work

When starting a project

🔸 Experiment
🔸 Figure out how it should work
🔸 Cut corners
🔸 Make ugly hacks

Do anything it takes to solve the problem — and have a working system

Feb 12, 2021 • 5 tweets • 1 min read

Most useful regular expressions for text pre-processing:

🔸 Removing non-letters - \W+
🔸 Replacing numbers with a special token - \d+
🔸 Removing extra whitespaces - \s+

I use these three expressions in every project with text

Code 👇 🔸 Removing non-letters🔸

non_letter = re.compile(r'\W+')
text = non_letter.sub(' ', text)

Feb 9, 2021 • 6 tweets • 1 min read

They say:

"Kaggle doesn't teach you how to translate a business problem into machine learning terms"

This is NOT true

You CAN learn a great deal from @kaggle

Let me tell you how you can do it in 4 simple steps. None of them requires taking part in a competition

Thread 👇 1️⃣ Explore

🔸 Look at the past competitions
🔸 Find 20 competitions that are interesting
🔸 Put them in a spreadsheet

Feb 8, 2021 • 9 tweets • 2 min read

Interview process for ML Engineers and Data Scientists:

1️⃣ Screening
2️⃣ Machine Learning
3️⃣ Coding
4️⃣ Case studies
5️⃣ System design
6️⃣ Behavioral

Here's what you can expect at each step (Thread) 👇 2️⃣ Machine Learning

Usually theoretical questions:

🔸 Linear models
🔸 L1 vs L2 regularization
🔸 XGB vs Random Forest
🔸 Why need activation for neural nets

Feb 8, 2021 • 8 tweets • 2 min read

🤖 Learning machine learning?

Focus on mastering these algorithms:

🔸 Linear regression
🔸 Logistic regression
🔸 Decision trees
🔸 Random forest
🔸 Gradient boosting
🔸 Neural networks + CNN

Don't know how?

Here's a detailed mega-thread 👇

(check the replies as well!) Linear regression 👇

https://twitter.com/Al_Grigor/status/1356611658932760576

Feb 5, 2021 • 4 tweets • 1 min read

The toughest data science interview I ever had

I got bombarded for 45 minutes with theoretical questions:

🔸 Entropy
🔸 KL divergence, other divergences
🔸 Kolmogorov complexity
🔸 Jacobian and Hessian
🔸 Linear independence
🔸 Determinant

Continued 👇 🔸 Eigenvalues and Eigenvectors
🔸 SVD
🔸 The norm of a vector
🔸 Independent random variables
🔸 Expectation and variance
🔸 Central limit theorem

👇

Jan 29, 2021 • 4 tweets • 1 min read

Learning path to mastering Data Science:

🔸 Python
🔸 Git
🔸 SQL
🔸 NumPy
🔸 Pandas
🔸 Scikit-Learn
🔸 Flask
🔸 Docker
🔸 AWS
🔸 TensorFlow
🔸 Linear Algebra
🔸 Machine Learning basics

What else? Things from Linear Algebra to focus on:

https://twitter.com/Al_Grigor/status/1352595002992504838

Jan 28, 2021 • 7 tweets • 2 min read

MLOps is just glorified DevOps:

1️⃣ They have the same culture
2️⃣ Tools are the same
3️⃣ Experiments existed long before MLOps
4️⃣ ML problems are mostly engineering problems

Thread 👇 1️⃣ MLOps and DevOps have the same culture

Both advocate for

🔸 End-to-end shared responsibility of the team
🔸 Automating everything
🔸 Autonomous teams
🔸 Continuous learning from failures

Jan 22, 2021 • 5 tweets • 2 min read

How to learn Linear Algebra and say sane?

Thread 👇 Start with Gilbert Strang's course. This is the best course about Linear Algebra

I wish my university teachers were like that

ocw.mit.edu/courses/mathem…

Jan 17, 2021 • 4 tweets • 1 min read

OSI model for ML:

5️⃣ ML libraries (Scikit-Learn, XGBoost, TF)
4️⃣ Core libraries (NumPy)
3️⃣ Algorithms (linear models, trees)
2️⃣ Native code (Fortran, C, C++)
1️⃣ Math (linear algebra, probability, calculus) When developing web apps, we start with the application layer of OSI without worrying about the underlying layers

But why do we start learning machine learning with mathematics?

Mar 14, 2020 • 52 tweets • 9 min read

Preparing for a #MachineLearning or #DataScience interview?

One retweet — one technical question.

Categories: SQL, coding (Python) and algorithms

Let’s start!

#100DaysOfMLCode #100DaysOfPythonCode = SQL =

Suppose we have the following schema:
* Ads(ad_id, camplaign_id, status)
* Events(event_id, ad_id, source, event_type, date, hour)

status: active, inactive
event_type: impression (ad is shown), click (ad is clicked), conversion (app is installed)

Feb 21, 2020 • 160 tweets • 8 min read

Preparing for a #MachineLearning #DataScience interview?

One retweet - one theoretical interview question in the thread 👇

Feel free to give your answers

Let's start!

#100DaysOfCode #100DaysOfMLCode

Interview questions are typically based on what the company needs and/or projects you have worked with previously.

So if you didn’t work with time series - it’s unlikely you’ll get many questions about it. Same with computer vision, NLP or recommender systems.

Share this page!

Enter URL or ID to Unroll