Everyone that does some Data Analysis or Machine Learning knows the Pandas library 🐼

One thing that not everyone is aware of is how to use it efficiently!

Have you thought about how much memory your dataframe is using? πŸ€”

How to use less? πŸ—œοΈ

Let me show you…

[2 min]

1/8🧡
Let's start by loading a csv file.

The example I'll use is the train.csv file from the Kaggle 30 days of ML competition

kaggle.com/c/30-days-of-m…

It's a good example to start

2/8
Loading a csv file in Pandas is very simple:

import pandas as pd
some_data = pd.read_csv("./train.csv")

But how much memory is it using?

3/8
To help with memory usage, Pandas has a memory_usage method.

It returns how much each column is using in bytes.
To get the full DataFrame usage, the sum method can help.

4/8🧡
When we load a csv, Pandas does a good type inference but it's not always optimal.

It's conservative and uses the biggest type possible (int64 instead of int8)

In this case, an int64 (8 bytes) is way bigger than an int8 (1 byte).
This could be an 8X difference!!!

5/8🧡
How about we look into the data (min and max values) and use a smaller data type when possible?
We can do the same thing for float numbers too!

With this simple idea, we've already got some good results: 45% reduction!

6/8🧡
This tip I learned from this great blog post: towardsdatascience.com/6-pandas-mista…

It has some other cool tips to help use Pandas better!

7/8🧡
Pandas is a key tool for ML and Data Analysis but it's important to understand also how to use it effectively!

Did you know about this?
Do you have any good Pandas tips to share?

Don't forget to share this and follow me @gusthema for daily ML, Python and Career content!

8/8🧡

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Luiz GUStavo πŸ’‰πŸ’‰πŸŽ‰

Luiz GUStavo πŸ’‰πŸ’‰πŸŽ‰ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @gusthema

5 Nov
Imagine you need to load a very large (eg: 5.7GB) csv file to train your model!πŸ€”

This is a very common problem in real world situations and also in many Kaggle competitions!

How can we use Pandas 🐼 effectively to do that?

Let's dive in…

[2 effective min]

1/10🧡
We will use the New York City Taxi Fare Prediction dataset from Kaggle

The csv file has 5.7 GB!!! 😱

Let's try the most obvious thing, just loading it:

df = pd.read_csv("./new-york-city-taxi-fare-prediction/train.csv")

This won't load on Kaggle Kernels!
2/10🧡 Image
That's a bummerβ€¦πŸ˜­

How do I even get to see which columns are in the file?

We can start by loading only some rows (eg: 5) and get some insights.πŸ”

This can give some good information already

3/10🧡 Image
Read 10 tweets
24 Oct
This week I posted about the Code interview!!!πŸ˜±πŸ˜­πŸ€“

Here's a summary if you missed it

[30 sec]

1/🧡
To start, You'll need some tips on how to succeed!πŸ‘πŸΎπŸ‘πŸΎ



2/🧡
To follow the tips and succeed you'll need good resources to study and practice!πŸ“šπŸ‘“

I've got you covered:

3/🧡
Read 6 tweets
22 Oct
Following up on the mock interview we did earlier this week, let me summarize all the topics discussed in the answers

Before starting, thanks to everyone for participating, it was great!

[1.5 min]

🧡
1⃣- Many good answers that, even being a working solution, weren't the fastest ones.
That's ok, but of course an interviewer might follow up asking you: Can you make it faster?

Tip: would a better data structure help you?
2⃣- A working solution is better than no solution!

Sometimes we want to optimize the code but usually you only have 1 hour to finish! Be smart, have a solution and explain how you'd make it perfect

With practice, your good solutions will become the perfect solution by default!
Read 8 tweets
19 Oct
When I was studying for my technical interviews I used a couple of different resources

Here is a list of the 4 most important ones..

[And some bonus ones! 🎁🎁]

[1 minute of investment]

1/8🧡
1⃣ The Algorithm Design Manual by Steven S. Skiena

Is a great book to study basic and advanced algorithms! The text is very clear and good to learn or review.

2/8🧡
2⃣ Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald L Rivest, Clifford Stein

It's also known as CLRS

This is one of the main books on Computer Algorithms.
It's very deep!



3/8🧡
Read 8 tweets
18 Oct
Are you afraid of the code interviews?😱😱😱
You're not alone!πŸ«‚

I've done many on both sides (candidate and interviewer)!

Here are some tips that helped me succeed:
πŸ‘πŸΎπŸ‘πŸΎπŸ‘πŸΎ

[1.5 minutes]

1/10🧡
During the interview you are being evaluated in many aspects and not only your coding skills

So while solving the technical questions, talk to the interviewer and explain what you are trying to achieve.

Communication is an important skill!

2/10🧡
Some people get anxious during the interview and might get blocked or forget even the basics. This is a very common problem

To overcome this what I did was:
β€’ Practice
β€’ Practice
β€’ Practice!

I solved many many problems on the white board before going for an interview

3/10🧡
Read 10 tweets
6 Oct
A common question I get from developers is:

Which programming language do I need to know to start with Machine Learning?
πŸ‘…πŸ€–πŸ§ 

[1 quick⚑️ min]

1/5🧡
The easy answer is: If you know how to code well, that's all you need to start learning ML!

TensorFlow for example enables you to use ML in many languages like C++, Java, Kotlin, Swift, Objective C, JavaScript, Go, Julia, Scala, Ruby, C# and many others

Butβ€¦πŸ‘€

2/5🧡
The more realistic answer: Python🐍

Most of the ML samples, tutorials and content in general you'll see is written in Python

Understanding the basics of the language will definitely make your life MUCH easier

Here is a good place to start: docs.python.org/3/tutorial/

3/5🧡
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(