Did you know that you can apply styles to your Pandas visualization?

Let's take a brief look at it πŸ‘€

[1 min]
1/8🧡
Now that you have loaded the data, it's very important to understand it.

To help with that it's good to be able to read it properly and formatting the data definitely help!

Let's come back to the New York Taxi fare

2/8🧡
The fare amount is money.

To format a financial value in Python, we would use the string format "${15,.2f}"

Pandas has a style object and a very similar format method:

3/8🧡
But how about that date column? We don't need all those zeros.

Let's just show a simple iso format. To do that we can use: strftime('%Y-%m-%d')

You can use a lambda function to apply that to the column (in a previous thread we already casted it to datetime)!

4/8🧡
How big are the values compared to the average of the column? To know that, we can add some bars per cell!

This can give a good insight on the size of numerical values.

5/8🧡
How about adding some more color.

The higher the value we will paint with a darker color
In our case green but you can use any of the Matplotlib colormap options

This can also be applied per row or per column

6/8🧡
You usually will not see all your rows unless you have a very basic dataset.

Ideally you'd use this styling in one of the 2 use cases:

β€’ A sample or projection of the data
β€’ The descriptive statistics view (describe method)

7/8🧡
To understand data you need all the help you can get, giving some visual clues to your eyes is very welcome! πŸ‘οΈπŸ”

Don't forget to share this thread with your friends and let's make some colorful reports!

And if you're not following me yet, you're missing out!πŸ˜‰

8/8🧡

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Luiz GUStavo πŸ’‰πŸ’‰πŸŽ‰

Luiz GUStavo πŸ’‰πŸ’‰πŸŽ‰ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @gusthema

10 Nov
Effective Pandas🐼 tip [4]:

When you start to work on a real dataset with more data (millions of records) and want to run a transformation on the data, what should you do?

Let me tell you how to make your execution more than 19000 times faster!!
🀯🀯🀯

[1 effective min]

1/7🧡
From the documentation, the way to do that would be using the apply method.

It receives a function that is applied to the data (row or col)

Let's try a basic operation: col2 - col1

2/7🧡
Using that on a dataset with 25 million rows, it took 11 minutes! 🐌🐌🐌

Additionally, it uses a lot of memory! On Kaggle Kernels, it almost used all the 16GB of memory available during processing!

Can we do it faster?πŸ€”

3/7🧡
Read 7 tweets
6 Nov
How can we change a 3 minute load time to 1 second?
⚑️⚑️⚑️🀯

As a Pandas🐼 user, the read_csv method might be very dear πŸ’•to you.
But even with a lot of tuning, it will still be slow.

Let's make it faster!!!

[1 ⚑️ min]

1/7🧡
As a ML developer or Data Scientist, [re]loading data is something you do many many times a day!

Having long loading times can make experimentation annoying as everytime you do it, you'll "pay" the time-tax

2/7🧡
One trick to make loading faster is to use a faster file format!

Let's try the Feather file format.

It is a portable that uses the Arrow IPC format: arrow.apache.org/docs/python/ip…

3/7🧡
Read 7 tweets
5 Nov
Imagine you need to load a very large (eg: 5.7GB) csv file to train your model!πŸ€”

This is a very common problem in real world situations and also in many Kaggle competitions!

How can we use Pandas 🐼 effectively to do that?

Let's dive in…

[2 effective min]

1/10🧡
We will use the New York City Taxi Fare Prediction dataset from Kaggle

The csv file has 5.7 GB!!! 😱

Let's try the most obvious thing, just loading it:

df = pd.read_csv("./new-york-city-taxi-fare-prediction/train.csv")

This won't load on Kaggle Kernels!
2/10🧡
That's a bummerβ€¦πŸ˜­

How do I even get to see which columns are in the file?

We can start by loading only some rows (eg: 5) and get some insights.πŸ”

This can give some good information already

3/10🧡
Read 11 tweets
4 Nov
Everyone that does some Data Analysis or Machine Learning knows the Pandas library 🐼

One thing that not everyone is aware of is how to use it efficiently!

Have you thought about how much memory your dataframe is using? πŸ€”

How to use less? πŸ—œοΈ

Let me show you…

[2 min]

1/8🧡
Let's start by loading a csv file.

The example I'll use is the train.csv file from the Kaggle 30 days of ML competition

kaggle.com/c/30-days-of-m…

It's a good example to start

2/8
Loading a csv file in Pandas is very simple:

import pandas as pd
some_data = pd.read_csv("./train.csv")

But how much memory is it using?

3/8
Read 8 tweets
24 Oct
This week I posted about the Code interview!!!πŸ˜±πŸ˜­πŸ€“

Here's a summary if you missed it

[30 sec]

1/🧡
To start, You'll need some tips on how to succeed!πŸ‘πŸΎπŸ‘πŸΎ



2/🧡
To follow the tips and succeed you'll need good resources to study and practice!πŸ“šπŸ‘“

I've got you covered:

3/🧡
Read 6 tweets
22 Oct
Following up on the mock interview we did earlier this week, let me summarize all the topics discussed in the answers

Before starting, thanks to everyone for participating, it was great!

[1.5 min]

🧡
1⃣- Many good answers that, even being a working solution, weren't the fastest ones.
That's ok, but of course an interviewer might follow up asking you: Can you make it faster?

Tip: would a better data structure help you?
2⃣- A working solution is better than no solution!

Sometimes we want to optimize the code but usually you only have 1 hour to finish! Be smart, have a solution and explain how you'd make it perfect

With practice, your good solutions will become the perfect solution by default!
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Thank you for your support!

Follow Us on Twitter!

:(