Sunil Kumar Profile picture
11 Oct, 13 tweets, 2 min read
How would you go about partitioning and sharding when you have huge volume of data in your applications?

Let's discuss the approach you should take in this thread:

The approach we will discuss here is assuming we have a relational database.

SQL vs NoSQL is entirely a new discussion altogether. That's not in the scope of this thread.

Before discussing here are some terms you should be familiar with:

Partitioning: is a process of dividing a table by rows into multiple smaller children tables within an instance.

Sharding: is a process of dividing a table by rows across multiple instances.

Read more on these topics if these definitions do not make sense to you.

1️⃣ For starters, a db without any partitioning and sharding is sufficient when your data volume is less.

You can create indexes on your tables based on your read use cases.Your reads will become faster and writes slightly slower based on the data & number of indexes created.

2️⃣Let's say you have huge volume of data, next you shoud think about partitioning the data. Partitioning basically divides data in your tables into smaller children tables.

Once you partition, db engine takes care of managing the data in those children tables.

Note that nothing changes at the query level when partitioning. You still query the data from the main table.

3️⃣ Now let's say there are a huge number of connections to your db and so many applications are reading and writing data into your db.

Now having only one db server can hamper the overall performance. So it's smart to replicate the data to have multiple slave nodes to read.

In this case we will have one master which will take all the writes and multiple slave nodes which will take all the reads.

This will improve the overall performance since reads and writes are not affected by one another.

4️⃣ Let's say now the data volume has increased even more and one master is not able to handle the write requests.

In this case we need to think about sharding the data into multiple master db instances.

There are multiple techniques to shard the data. Range based sharding is one simple technique.

For example all the rows with users names in the range A - L can be put into one shard, M - Z can be put into another shard.

Consistent hashing is a popular technique in sharding.

But sharding comes with its own challenges like joins across multiple shards can be very costly.

You need to think about all these scenarios which depends on your use cases and how your data looks like.

Conclusion: In a nutshell this is how you need to think

1. Indexes
2. Partitioning
3. Replication
4. Sharding

Please correct or add any information if you think is missing.

I hope this thread has helped you learn something new today 😊

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Sunil Kumar

Sunil Kumar Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @sunilc_

12 Oct
15 Websites that offer free HTML & CSS website templates:

1. HTML5Up
2. TemplateMO
Read 16 tweets
12 Oct
Interview Tip:

Why are NoSQL databases more scalable than SQL databases?

SQL databases follow ACID properties. But NoSQL database do not. To follow ACID principles lot of bookkeeping happens behind the scenes.

NoSQL DBs relax principles like durability and consistency too.

- Dropping Atomicity lets you shorten the duration for which tables (sets of data) are locked. Example: MongoDB, CouchDB.

- Dropping Consistency lets you scale up writes across cluster nodes. Examples: riak, cassandra.

Dropping Durability lets you respond to write commands without flushing to disk. Examples: memcache, redis.

NoSQL DBs give up the A, C and/or D requirements, and in return they improve scalability.

Read 5 tweets
11 Oct
Free stock photos for your websites, blogs and applications:

1. Unsplash
2. Pexels Image
Read 16 tweets
10 Oct
Are you a front-end developer / designer ?

You will be interested in these 20 websites that will help you with color codes & palettes:

1. HTML Color Codes
2. Gradient Hunt
Read 21 tweets
9 Oct
We keep hearing about the CAP Theorem, know what it means, but don't really know what it actually means in a real world distributed system.

Let's discuss about CAP Theorem in detail in this thread:


CAP Theorem is one of the important system design concepts. It states that any distributed system can guarantee only 2 of the 3 following properties:

- Consistency
- Availability
- Partition Tolerance

Let's discuss different scenarios possible in a real world.

For the discussion we will take this distributed system as an example which has two components C1 & C2.

They are connected to each other and are always in sync with respect to the data they store.

Read 13 tweets
6 Oct
Everyone wants to make money through freelancing but struggle to get started. Choose the right platform based on your skills.

Here's a thread on the list of top 20 freelancing platforms for designers and developers:

1. Upwork
2. Freelancer
Read 21 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!