Tom Mitchell Profile picture
Sep 10 11 tweets 2 min read Read on X
The difference between SQL, NoSQL, MySQL, and Postgres explained in simple terms:
SQL (Structured Query Language) is the programming language used to manage and manipulate relational databases.
Before we get into the specifics, if you want to learn more about SQL, you should sign up for my (free) weekly newsletter The Data Dose.

I send 1 short, actionable email every week teaching 13,000+ professionals high paying data skills.

Check it out here:

Now, on with the breakdown...thedatadose.com
NoSQL (Not Only SQL) is a type of database that does not use the traditional table-based structure of a relational database.

Instead, it uses a more flexible data model that can handle unstructured (like social media posts) and semi-structured data (emails).

Document stores like MongoDB store data as documents.
Key-value stores like Redis and AWS DynamoDB house simple key-value pairs.

It's an umbrella term rather than a single database.
MySQL and PostgreSQL are examples of open source relational database management systems.
They use structured tables to store data and provide tools for efficiently organising, retrieving and manipulating data.
PostgreSQL handles both traditional relational data and some document-style data.
In a nutshell:

- SQL is the language we use to talk to databases

- NoSQL is a type of database that holds unstructured and semi-structured data.

- MySQL and Postgres are relational database management systems.
BONUS TIP MS SQL SERVER:

Microsoft SQL Server is a relational database from Microsoft that integrates tightly with other Microsoft technologies.
That's a wrap!

Did you find this interesting?

If so, drop it a like so I know to make more content like this.

Also, if you like this type of content, follow me

@imtommitchell

I post this type of content daily.
P.S. If you're interested in data you'll love my weekly newsletter.

I share everything I know about building high-paying data skills from my 8+ years in the industry.

Subscribe for free here:

thedatadose.com

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Tom Mitchell

Tom Mitchell Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @imtommitchell

Sep 3
I've worked with PowerBI for 6+ years.

Here are the 3 biggest mistakes I see beginners make (and how to avoid them):
1 - Not optimising data models

What's an easy way to kill user experience and force people to stop using your work?

Have a bloated, sluggish data model.

Instead, only select the data you need.

You can always add to it if needed.

Limit the amount of complex aggregations.

Need to include some complex operations?

Create tailored datasets beforehand.
2 - Cramming too much into visualisations

A simple trick here...

Got lots of categories?

Limit the number shown on each visualisation to top 3-5.

Make it easy for users to digest.

If you need to see the whole picture?

Group the values with least impact into "other".
Read 7 tweets
Aug 26
5 steps to write SQL queries that actually perform well:
1. Start with your WHERE clause

Why?

Because filtering early reduces the dataset size before expensive
operations like JOINs and GROUP BYs kick in.

Plan your filters first, then build around them.
2. Use Indexes

They help database find specific information much faster without having to search everything.

Pro tip: Composite indexes give you a huge benefit.

If you're regularly filtering on (date, status, region), create one index on all three columns in that order.
Read 9 tweets
Aug 25
If you want to work in data, you should know CI/CD.

Here's a dead simple breakdown that'll teach you fast:
What is CI/CD?

Continuous Integration (CI) + Continuous Deployment (CD).

In simple terms: automatically testing and deploying your data pipelines without breaking things.

Quality control for your code.
Why data teams need it:

Your SQL query works on your laptop.

But will it work in production?
Will it break when your teammate pushes code?
Will bad data crash your entire pipeline?

CI/CD prevents these disasters.

The 4 core components:
Read 12 tweets
Apr 30
A Data Analyst who understands segmentation will never be short of work.

Problem is, it's not covered often in courses.

Here's everything I know about customer segmentation condensed:
Customer segmentation helps businesses understand their customers better.

It allows them to tailor marketing strategies, product offerings, and customer experiences to meet the specific needs of each segment.

A segment is a group of customers that share characteristics.

You can segment based on factors like age, gender, location, income, and interests.

Here are some examples of different types of segmentation:
Demographic Segmentation divides customers based on characteristics like age, gender, education, and income.

Behavioural Segmentation categorises customers based on their interactions with your products or services.

Are they loyal, occasional buyers, or inactive customers?

Psychographic Segmentation looks at customer attitudes, values, and lifestyles.
Read 7 tweets
Mar 31
Learn SQL in 10 steps (the simple way):
1. The basics are your best mates.

From SELECT to WHERE, get familiar with them.

They’ll make the advanced stuff much easier to grasp.

Don't go too deep into windows functions etc yet. Plenty of time for that later.
2. Coding is an art.

It takes time and consistency to create masterpieces.

Don't rush it.

Look at how other professionals structure their code.

Readability, maintainability, efficiency is the aim of the game.
Read 12 tweets
Mar 26
To do Data Analysis using Python you must master Pandas.

But this library contains a lot.

Here is what you need to focus on from day 1 👇
Pandas is an open-source Python library built on top of a Python core packages called NumPy (Numerical Python).

Pandas offers Data Analysts an easy way to work with data and provides many tools for extracting maximum value.

Let's get into it...
There are two main concepts in Pandas:

A series and a dataframe.

A Series is a Pandas array that can hold any type of data.

It is a one-dimensional array or a single column of a matrix.

A series is a set of data values that are associated with a specific label, with specific index values attached to each row.
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(