Tweet

Luiz GUStavo 💉💉🎉

1 Dec, 11 tweets, 3 min read

What is JAX?

JAX is Autograd and XLA, brought together for high-performance numerical computing and ML research. It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, JIT compile to GPU/TPU, and more.

🤔🧐

#30DaysOfJAX

1/11🧵

That's already a lot to take in!
Let's try to understand the key words first

What is:
• Autograd
• XLA
• Differentiate
• Just-in-time compile

2/11🧵

What is Differentiate?

🚨 If you studied Calculus you might remember this one. (bear with me, don't run!)

Imagine you have a function:

-> f(x) = 3*x + 4

and you want to know how sensible your output (f) is to changes in the input (x).

3/11🧵

That's what we call the Derivative of f with respect of x

For our example, you can see that for a change in 1 on x, it will impact in a change of 3 on f

So the Derivative of f with respect to x is 3
Differentiation is the process to find the derivative

4/11🧵

When we have a function with many different variables:

-> f(x1, x2, ...) = a1*x1 + a2*x2 + … + b

We calculate the derivative of f with respect to x1, f with respect to x2…

Each one is called a partial derivative
The vector with all of them is called gradient!

5/11🧵

The Gradient is the vector with all the partial derivatives of a multivariate function

To understand the meaning, think like this:

-> If you are climbing a mountain, and you have a function (f) that tells you altitude
-> The gradient of f will point to the top of the ⛰️

6/11🧵

What is Autograd?

It is a library that can automatically differentiate native Python and Numpy code.

Given your def(a, b, c), you can find it's gradient

The main intended application of Autograd is gradient-based optimization

github.com/hips/autograd

7/11🧵

What is XLA?

Accelerated Linear Algebra (XLA) is a domain-specific compiler for linear algebra that can accelerate math operations with potentially no source code changes.

tensorflow.org/xla

8/11🧵

What is JIT? ⚡️

Just-in-time compilation is technique that interpreted languages use that, while executing code, the interpreter will also compile the code so that on the next time that the code is executed, it will run as compiled code (faster) ⚡️

9/11🧵

With all the concepts (loosely explained for brevity), we can understand what is JAX:

JAX is NumPy on the CPU, GPU, and TPU, with great automatic differentiation for high-performance machine learning research.

10/11🧵

This month I'll keep posting about my journey learning JAX. 📚👓

Now I know what it is! Next: how to use it!

Join me and let's learn together!

Follow me so you don't miss the next updates!

#30DaysOfJAX

11/11🧵

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @gusthema

Luiz GUStavo 💉💉🎉

@gusthema

11 Nov

Did you know that you can apply styles to your Pandas visualization?

Let's take a brief look at it 👀

[1 min]
1/8🧵

Now that you have loaded the data, it's very important to understand it.

To help with that it's good to be able to read it properly and formatting the data definitely help!

Let's come back to the New York Taxi fare

2/8🧵

The fare amount is money.

To format a financial value in Python, we would use the string format "${15,.2f}"

Pandas has a style object and a very similar format method:

3/8🧵

Read 8 tweets

Luiz GUStavo 💉💉🎉

@gusthema

10 Nov

Effective Pandas🐼 tip [4]:

When you start to work on a real dataset with more data (millions of records) and want to run a transformation on the data, what should you do?

Let me tell you how to make your execution more than 19000 times faster!!
🤯🤯🤯

[1 effective min]

1/7🧵

From the documentation, the way to do that would be using the apply method.

It receives a function that is applied to the data (row or col)

Let's try a basic operation: col2 - col1

2/7🧵

Using that on a dataset with 25 million rows, it took 11 minutes! 🐌🐌🐌

Additionally, it uses a lot of memory! On Kaggle Kernels, it almost used all the 16GB of memory available during processing!

Can we do it faster?🤔

3/7🧵

Read 7 tweets

Luiz GUStavo 💉💉🎉

@gusthema

6 Nov

How can we change a 3 minute load time to 1 second?
⚡️⚡️⚡️🤯

As a Pandas🐼 user, the read_csv method might be very dear 💕to you.
But even with a lot of tuning, it will still be slow.

Let's make it faster!!!

[1 ⚡️ min]

1/7🧵

As a ML developer or Data Scientist, [re]loading data is something you do many many times a day!

Having long loading times can make experimentation annoying as everytime you do it, you'll "pay" the time-tax

2/7🧵

One trick to make loading faster is to use a faster file format!

Let's try the Feather file format.

It is a portable that uses the Arrow IPC format: arrow.apache.org/docs/python/ip…

3/7🧵

Read 7 tweets

Luiz GUStavo 💉💉🎉

@gusthema

5 Nov

Imagine you need to load a very large (eg: 5.7GB) csv file to train your model!🤔

This is a very common problem in real world situations and also in many Kaggle competitions!

How can we use Pandas 🐼 effectively to do that?

Let's dive in…

[2 effective min]

1/10🧵

We will use the New York City Taxi Fare Prediction dataset from Kaggle

The csv file has 5.7 GB!!! 😱

Let's try the most obvious thing, just loading it:

df = pd.read_csv("./new-york-city-taxi-fare-prediction/train.csv")

This won't load on Kaggle Kernels!
2/10🧵

That's a bummer…😭

How do I even get to see which columns are in the file?

We can start by loading only some rows (eg: 5) and get some insights.🔍

This can give some good information already

3/10🧵

Read 11 tweets

Luiz GUStavo 💉💉🎉

@gusthema

4 Nov

Everyone that does some Data Analysis or Machine Learning knows the Pandas library 🐼

One thing that not everyone is aware of is how to use it efficiently!

Have you thought about how much memory your dataframe is using? 🤔

How to use less? 🗜️

Let me show you…

[2 min]

1/8🧵

Let's start by loading a csv file.

The example I'll use is the train.csv file from the Kaggle 30 days of ML competition

kaggle.com/c/30-days-of-m…

It's a good example to start

2/8

Loading a csv file in Pandas is very simple:

import pandas as pd
some_data = pd.read_csv("./train.csv")

But how much memory is it using?

3/8

Read 8 tweets

Luiz GUStavo 💉💉🎉

@gusthema

24 Oct

This week I posted about the Code interview!!!😱😭🤓

Here's a summary if you missed it

[30 sec]

1/🧵

https://twitter.com/gusthema/status/1450107787086860288

To start, You'll need some tips on how to succeed!👍🏾👍🏾

https://twitter.com/gusthema/status/1450107787086860288

2/🧵

https://twitter.com/gusthema/status/1450448249702191105

To follow the tips and succeed you'll need good resources to study and practice!📚👓

I've got you covered:

https://twitter.com/gusthema/status/1450448249702191105

3/🧵

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Luiz GUStavo 💉💉🎉

Try unrolling a thread yourself!

More from @gusthema

Luiz GUStavo 💉💉🎉

Luiz GUStavo 💉💉🎉

Luiz GUStavo 💉💉🎉

Luiz GUStavo 💉💉🎉

Luiz GUStavo 💉💉🎉

Luiz GUStavo 💉💉🎉

Did Thread Reader help you today?

Like this author's thread?