Tom Mitchell Profile picture
Aug 5, 2023 7 tweets 3 min read Read on X
GitHub offers the best free Data Science education on the internet.

But there are more than 372 million repositories to choose from.

How do you find the best ones?

Bookmark these 5 repositories and start learning fast:
1. Free Programming Books

Books are still an important source of knowledge for any field — and Data Science is no exception.

This GitHub repository contains a huge list of freely available books to learn anything related to programming.

🔗 https://t.co/VnC2HXnKAqgithub.com/EbookFoundatio…
Image
2. Data Science Roadmap

This repo covers everything from fundamentals to statistics and programming, and then on to machine learning, data visualization and beyond!

🔗 https://t.co/G0DTtEMCOzgithub.com/Moataz-Elmesma…
Image
3. Awesome Repo

The Awesome Github repository provides an organized list of machine learning libraries, frameworks and tools in many different languages.

🔗 https://t.co/nARAZNZgJAgithub.com/sindresorhus/a…
Image
4. Public APIs for Data

Finding datasets to practice on can be a challenge.

This repo contains a collective list of free APIs to use for data work

🔗 https://t.co/QSnYHkkAMLgithub.com/public-apis/pu…
Image
5. Project-Based Learning

A list of programming tutorials divided into primary programming languages like R and Python.

🔗 https://t.co/epotI5RLRxgithub.com/practical-tuto…
Image
And there you have it!

5 elite Github repos to get you started on your Data Science journey.

If you found this thread helpful, consider following me: @tommitchelldata

I post data-related content every day.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Tom Mitchell

Tom Mitchell Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @imtommitchell

Nov 15
Data cleaning is one of the most important skills for a data analyst.

Not Excel.
Not SQL.
Not PowerBI

Without clean data, any analysis done is unreliable.

Here's my data cleaning 101:
First, why do we even need to clean data anyway?

How can you have "dirty data"?

Well, datasets might have incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data.

Especially when working with more than one dataset (joining or merging), there are many opportunities for data to be duplicated or mislabelled.
There is no "one size fits all" when it comes to data cleaning.

But it is important to understand the different components so you can spot scenarios and apply the most relevant techniques.

Let's get stuck in...
Read 10 tweets
Oct 28
I spent 3 years as a data analyst. Here are the 4 biggest time wasters I see new analysts fall into:
1. Building reports nobody asked for

You think you're being proactive by going all out to build the next best suite of analysis.

The reality:

You're creating work that doesn't matter.

Before you build anything, ask:

"Who needs this and when do they need it by?"
2. Cleaning the same messy data over and over

Every month, you get a CSV export that's formatted terribly.

Every month, you spend 2 hours fixing it manually.

Write a script once. Save yourself 24 hours a year.

There's time back to add to the self development bucket.
Read 8 tweets
Sep 22
The fastest way to learn Python when you already know SQL (in 6 steps):
1. Learn just enough basics (1 week max)

You need to understand:

Variables: name = "John"
Lists: [1, 2, 3]
Dictionaries: {"key": "value"}
Basic functions: len(), print()

Don't get stuck here.

You'll learn the rest naturally through pandas.
2. Jump straight into pandas DataFrames

DataFrames are SQL tables you can code with.

Everything you know about data structure applies here.

This is where SQL engineers have a massive advantage.
Read 11 tweets
Sep 20
I hear "you must learn SQL to become a data analyst" a lot.

And this is true.

But nobody tells you how to set it up and create your first table.

Here's how to do it in 4 simple steps :
1. Set up VSCode with SQLTools

Download VSCode from .

Open VSCode → Extensions (Cmd+Shift+X) → Search "SQLTools"

Install "SQLTools" by Matheus Teixeira.

Then install "SQLTools PostgreSQL".

These two extensions give you everything you need to get started: syntax highlighting, query execution, and database management.code.visualstudio.com
2. Install PostgreSQL

Option A: Homebrew (recommended) via terminal

brew install postgresql
brew services start postgresql

Option B: Direct download

Download from postgresql.org
Run the installer (accept defaults)
Remember your postgres password

Verify installation via terminal:

psql --version

You should see something like "psql (PostgreSQL) 15.4"
Read 8 tweets
Sep 10
The difference between SQL, NoSQL, MySQL, and Postgres explained in simple terms:
SQL (Structured Query Language) is the programming language used to manage and manipulate relational databases.
Before we get into the specifics, if you want to learn more about SQL, you should sign up for my (free) weekly newsletter The Data Dose.

I send 1 short, actionable email every week teaching 13,000+ professionals high paying data skills.

Check it out here:

Now, on with the breakdown...thedatadose.com
Read 11 tweets
Sep 3
I've worked with PowerBI for 6+ years.

Here are the 3 biggest mistakes I see beginners make (and how to avoid them):
1 - Not optimising data models

What's an easy way to kill user experience and force people to stop using your work?

Have a bloated, sluggish data model.

Instead, only select the data you need.

You can always add to it if needed.

Limit the amount of complex aggregations.

Need to include some complex operations?

Create tailored datasets beforehand.
2 - Cramming too much into visualisations

A simple trick here...

Got lots of categories?

Limit the number shown on each visualisation to top 3-5.

Make it easy for users to digest.

If you need to see the whole picture?

Group the values with least impact into "other".
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(