, 48 tweets, 25 min read Read on Twitter
So, my thread on how to choose databases for your web apps starts in 20 minutes time. Hope y'all are ready ?However, this time i would love it to be an interactive session where we can learn from each other about database preferences for specific web apps and the use case
#thread
This is what we want to talk about today. Take a really good look at the image below. Study it well. Also notice the words at the 3 vertexes of the triangle. CONSISTENCY, AVAILABILITY & PARTITION-TOLERANCE (Big English - but fear not, we shall demystify all and it'll all be easy)
If we also look at the image in the previous tweet on this thread we can see some popular databases placed at certain sides of the triangle. It is important to note also the positioning of some of a particular point on each side. We shall discuss this in more detail soon. #thread
Also notice that key-value pair NoSQL databases such as REDIS & MEMCACHED are missing from the triangle. This is deliberate so have no fear that i forgot about those. Thing is these kinds of NoSQL databases do not apply to CONSISTENCY, AVAILABILITY & PARTITION-TOLERANCE!
#thread
Also note that #Hadoop, #Spark e.t.c are not databases at all. Just in case you were thinking they were. They are just database helpers used to power parallel processing of data from SQL/NoSQL databases.

Before we continue, let's briefly discuss SQL and NoSQL databases. #thread
NoSQL databases lack constraints on the from and structure of data and its relationships, they are also very easy to use and faster than SQL databases (reads) and easy to scale:

Below are example of NoSQL databases:

1. CouchDB
2. Cassandra
3. MongoDB
4. HBase
5. DB2

#thread
SQL databases have certain constraints on the form and structure of data (especially integrity constraints) and its relationships, they are relatively easy to use and fast (but not faster that NoSQL databases) and quite difficult to scale.

#thread
Below are examples of SQL databases:

1. MySQL
2. MariaDB
3. Oracle
4. MS SQL
5. PostgreSQL
So to put it simply;

SQL databases are inflexible (upon data that have complex connection between them) storage systems and are used for only structured data.

#thread
While;

NoSQL databases are flexible (upon data with simple connections between them) storage systems and are used for both structured and unstructured data.

Take a break now and look at this short article that share more Pros/Cons on POPULAR DATABASES:

keycdn.com/blog/popular-d…
You can also read this article to get you familiar with more detailed Pros/Cons for POPULAR DATABSES:xplenty.com/blog/the-sql-v…

Now, lets cast our minds back to this image below as we now discuss the CAP theorem which relates to the image. Don't panic.. It's not mathematics
#thread
So, what is the CAP theorem you ask. It is a guide for most choosing the 2 best features of a database based on the features of the software we are building. Simple!

The theorem just emphasizes the fact that you only choose 2 out of 3 features

e.g CA, CP, AP from CAP

#thread
You can't choose all 3 only any 2.

C stands for CONSISTENCY
A stands for AVAILABILITY
P stand for PARTITION-TOLERANCE

Now, from this image i added earlier, you see that #MySQL is on the CA side of the triangle and #MongoDB is on the CP side of the triangle
Now, lets take them one by one - I'll start with CONSISTENCY

What does CONSISTENCY in a database really mean ?

It means that when there's data inside a database, barring no further updates, if a read (query operation) is executed upon the database, it retrieves the data
#thread
i.e. For every read (query operation) the database puts out the most up-to-date data from its storage.

There are 2 types of CONSISTENCY:

1. IMMEDIATE / STRONG CONSISTENCY
2. EVENTUAL / WEAK CONSISTENCY

Most databases that employ #transactions have STRONG CONSISTENCY

#thread
databases that employ #transactions (a pessimistic concurrency control mechanism that guarantees ACID updates to a database - thread/process locks) are #MySQL , #MariaDB, #PostgreSQL
Every query runs in a single thread of execution that will obtain a lock on the database tables (while performing an INSERT, DELETE, UPDATE query) so that no other query can access the database tables so that those other query reads back old/outdated data.
#transactions are actually one of the reasons why most SQL databases are slow while reading to OR writing from the SQL database but they guarantee that all reads return up-to-date data always.

So where is CONSISTENCY useful in choosing a database for a web/desktop software ?
STRONG CONSISTENCY is needed in software that relies on fresh data all the time for instance

1. a stock-trading software
2. a banking app
3. a scored-based multiplayer hyper-casual 2D/3D game.
4. a location-aware social app (think foursquare / snapchat)
5. chat apps

#thread
For the types of software listed above, operating with a database that mostly provides EVENTUAL CONSISTENCY is not going to be good as it defeats the primary features of the application - which is consistently up-to-data data from the database.

#thread
EVENTUAL CONSISTENCY is needed in software that can operate fine (i.e. not mission-critical) without up-to-date data all the time. For instance

1. Twitter Trends / Tweets (Yours Truly)
2. Recommender Systems (think Amazon / Yelp)
3. DNS
4. Analytics Systems

#thread
However, most times everything is not as straight forward as that may seem as for instance. A banking app needs to scale and be fault-tolerant too because of the sheer volume of transaction and bank account data generated over time.

Which brings us to AVAILABILITY next!
#thread
What is AVAILABILITY as it concerns databases ?

AVAILABILITY simple means that at any time a query is made to a database, the query must not return void (no errors or timeouts) and the query must return in the shortest time possible.

Mostly, this points to reliability
#thread
That a database is reliable means users who query the database can access data in the shortest time irrespective of the load or strain on the database.

Also AVAILABILITY point to recoverability

That a database is recoverable means if an error occurs, it can be rectified quickly
Also recover-ability means that when a database server goes down, it can quickly be replaced or resuscitated.

When a database has very good AVAILABILITY, it is said to be HIGHLY AVAILABLE. When a database has not too good AVAILABILITY, it is said to be BASELY AVAILABLE

#thread
A HIGHLY AVAILABLE database cannot have STRONG CONSISTENCY (never possible). It can only have EVENTUAL CONSISTENCY as this type of CONSISTENCY is used to guarantee HIGH AVAILABILITY.

Now, lets go back to our earlier image:
#thread
Notice, the position of #MariaDB in relation to #MySQL and #PostgreSQL on the left side of the triangle for the CONSISTENCY - AVAILABILITY side. This positioning just shows that #MariaDB, #Oracle is EVENTUALLY CONSISTENT & HIGHLY AVAILABLE

#thread
While #MySQL and #PostgreSQL is IMMEDIATELY (STRONG) CONSISTENT & BASELY AVAILABLE.

Now, lets go back to our app/software example of a banking app or stock-trading software.

From the user features of such, you know it must IMMEDIATELY (STRONG) CONSISTENT...

#thread
So therefore, it must also be BASELY AVAILABLE. But can we tilt things in such a way that it can be IMMEDIATELY (STRONG) CONSISTENT & SEMI-HIGHLY AVAILABLE ?

Well yes! we can choose #MariaDB over #MySQL or #PostgreSQL

#thread
This choice is so because when we setup a replication cluster (master - slave) setup for #MariaDB, we can actually tune its EVENTUAL CONSISTENCY towards IMMEDIATE CONSISTENCY by enabling semi-synchronous replication as opposed to asynchronous replication
#Neo4j is actually another graph-based database that has HIGH AVAILABILITY & EVENTUAL CONSISTENCY, it is used for #OpenData and Semantic Web ( #RDF ) data storage and queries

#thread
For #MySQL, you can also use it to build a banking application like the one Nigerian banks use with IMMEDIATE (STRONG) CONSISTENCY & then make up for the BASE AVAILABILITY by setting up a #MySQL replication cluster aided by a write-around cache db (think MEMCACHE / REDIS)
#thread
However this #MySQL setup is hard to manage and sometimes coordination between the cache ( MEMCACHED / REDIS ) and the master - slave (replication cluster) setup means additional code at the application level to manage such complexity. At the end of the day, let simplicity rule!
Now, we move on to talk about PARTITION-TOLERANCE.

PARTITION-TOLERANCE simply means that when a database is broken up into parts (partitions), it's ability to function optimally for CONSISTENCY / AVAILABILITY doesn't get impacted.

#thread
These parts are usually called "shards" and are usually complex to manage however especially for SQL databases. shards are created when you take a monolith database (with its table) and break it up by table rows (horizontal partitioning) or by columns (vertical partitioning)
Now partitioning a database should not be confused with creating a replication cluster ( master / slave )/ They are not the same thing. Actually, you are advised to use partitioning as a very very very last resort after you must have employed replication.

#thread
So how does PARTITION-TOLERANCE relate to databases ? Well, SQL database are not easily partitioned. There's a lot of gymnastics involved in partitioning a database properly, choosing a hash function (in the case of a shard key) and selecting appropriate key namespaces.

#thread
In fact, you usually need to setup more code at the application level to do this. For NoSQL databases, it's basically the same story. This is why for most web/desktop software you'll build, you just need to concentrate on balancing between CONSISTENCY / AVAILABILITY

#thread
Simply concentrate on choosing based on the trade-offs between CONSISTENCY/AVAILABILITY and tune to suit the user features of the software you are building.

See the image below that gives an idea of how to chose a NoSQL database based on the scale and performance needed

#thread
Now, getting back to this our fav image (😀):

We see that NoSQL databases like #Cassandra have EVENTUAL CONSISTENCY/ HIGH AVAILABILITY but also easier to partition (actual partition - using partition keys) than it's SQL counterparts on the other side of the triangle.

#thread
Now, it's important to note that NoSQL databases are IMMEDIATELY (STRONG) CONSISTENT by default (when there are no replication/partitions) but become EVENTUALLY CONSISTENT when replication/partitions are introduced.

This little detail should be kept in mind

#thread
#MongoDB has HIGH AVAILABILITY & EVENTUAL CONSISTENCY only when using replication/partitioning. A single #MongoDB instance (like the way we build apps using the MEAN / MERN stack) is IMMEDIATELY CONSISTENT & BASELY AVAIALABLE.

So, if you were building a #BuyPowerNG for example..
You really don't need data to be IMMEDIATELY CONSISTENT but you need it to be HIGHLY AVAILABLE. So, a #PowerShopNG or #BuyPowerNG or #Paga will most likely be build with #MongoDB ngoDB over #PostgreSQL for instance.

#MongoDB is easier to scale than #PostegreSQL

#thread
You can also very much use #Cassandra database to build an app like #BuyPowerNG too. #Cassandra can also be used for building a #Twitter or a #Facebook too or other kinds of Analytics backend (e.g. the database that serves the Google Analytics dashboard)

#thread
Okay time for some interaction...

What database do you think an #InventoryManagement or #VendorProcurementsSoftware should be built with ?

Let me have your answers in the replies and let's have a discussion
Also, i do have another interaction...

What database will be best suited for an #ElectronicMedicalRecord software system ?
After the discussions on the 2 questions above, i'll wrap up by talking about key-value stores - REDIS / MEMCACHED
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Okechukwu Ifeora
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!