Read on Twitter

12,399 views

@isocroft

, 48 tweets, 25 min read Read on Twitter

So, my thread on how to choose databases for your web apps starts in 20 minutes time. Hope y'all are ready ?However, this time i would love it to be an interactive session where we can learn from each other about database preferences for specific web apps and the use case
#thread

This is what we want to talk about today. Take a really good look at the image below. Study it well. Also notice the words at the 3 vertexes of the triangle. CONSISTENCY, AVAILABILITY & PARTITION-TOLERANCE (Big English - but fear not, we shall demystify all and it'll all be easy)

If we also look at the image in the previous tweet on this thread we can see some popular databases placed at certain sides of the triangle. It is important to note also the positioning of some of a particular point on each side. We shall discuss this in more detail soon. #thread

Also notice that key-value pair NoSQL databases such as REDIS & MEMCACHED are missing from the triangle. This is deliberate so have no fear that i forgot about those. Thing is these kinds of NoSQL databases do not apply to CONSISTENCY, AVAILABILITY & PARTITION-TOLERANCE!
#thread

Also note that #Hadoop, #Spark e.t.c are not databases at all. Just in case you were thinking they were. They are just database helpers used to power parallel processing of data from SQL/NoSQL databases.

Before we continue, let's briefly discuss SQL and NoSQL databases. #thread

NoSQL databases lack constraints on the from and structure of data and its relationships, they are also very easy to use and faster than SQL databases (reads) and easy to scale:

Below are example of NoSQL databases:

1. CouchDB
2. Cassandra
3. MongoDB
4. HBase
5. DB2

#thread

SQL databases have certain constraints on the form and structure of data (especially integrity constraints) and its relationships, they are relatively easy to use and fast (but not faster that NoSQL databases) and quite difficult to scale.

#thread

Below are examples of SQL databases:

1. MySQL
2. MariaDB
3. Oracle
4. MS SQL
5. PostgreSQL

So to put it simply;

SQL databases are inflexible (upon data that have complex connection between them) storage systems and are used for only structured data.

#thread

While;

NoSQL databases are flexible (upon data with simple connections between them) storage systems and are used for both structured and unstructured data.

Take a break now and look at this short article that share more Pros/Cons on POPULAR DATABASES:

keycdn.com/blog/popular-d…

You can also read this article to get you familiar with more detailed Pros/Cons for POPULAR DATABSES:xplenty.com/blog/the-sql-v…

Now, lets cast our minds back to this image below as we now discuss the CAP theorem which relates to the image. Don't panic.. It's not mathematics
#thread

So, what is the CAP theorem you ask. It is a guide for most choosing the 2 best features of a database based on the features of the software we are building. Simple!

The theorem just emphasizes the fact that you only choose 2 out of 3 features

e.g CA, CP, AP from CAP

#thread

You can't choose all 3 only any 2.

C stands for CONSISTENCY
A stands for AVAILABILITY
P stand for PARTITION-TOLERANCE

Now, from this image i added earlier, you see that #MySQL is on the CA side of the triangle and #MongoDB is on the CP side of the triangle

Now, lets take them one by one - I'll start with CONSISTENCY

What does CONSISTENCY in a database really mean ?

It means that when there's data inside a database, barring no further updates, if a read (query operation) is executed upon the database, it retrieves the data
#thread

i.e. For every read (query operation) the database puts out the most up-to-date data from its storage.

There are 2 types of CONSISTENCY:

1. IMMEDIATE / STRONG CONSISTENCY
2. EVENTUAL / WEAK CONSISTENCY

Most databases that employ #transactions have STRONG CONSISTENCY

#thread

databases that employ #transactions (a pessimistic concurrency control mechanism that guarantees ACID updates to a database - thread/process locks) are #MySQL , #MariaDB, #PostgreSQL

Every query runs in a single thread of execution that will obtain a lock on the database tables (while performing an INSERT, DELETE, UPDATE query) so that no other query can access the database tables so that those other query reads back old/outdated data.

#transactions are actually one of the reasons why most SQL databases are slow while reading to OR writing from the SQL database but they guarantee that all reads return up-to-date data always.

So where is CONSISTENCY useful in choosing a database for a web/desktop software ?

STRONG CONSISTENCY is needed in software that relies on fresh data all the time for instance

1. a stock-trading software
2. a banking app
3. a scored-based multiplayer hyper-casual 2D/3D game.
4. a location-aware social app (think foursquare / snapchat)
5. chat apps

#thread

For the types of software listed above, operating with a database that mostly provides EVENTUAL CONSISTENCY is not going to be good as it defeats the primary features of the application - which is consistently up-to-data data from the database.

#thread

EVENTUAL CONSISTENCY is needed in software that can operate fine (i.e. not mission-critical) without up-to-date data all the time. For instance

1. Twitter Trends / Tweets (Yours Truly)
2. Recommender Systems (think Amazon / Yelp)
3. DNS
4. Analytics Systems

#thread

However, most times everything is not as straight forward as that may seem as for instance. A banking app needs to scale and be fault-tolerant too because of the sheer volume of transaction and bank account data generated over time.

Which brings us to AVAILABILITY next!
#thread

What is AVAILABILITY as it concerns databases ?

AVAILABILITY simple means that at any time a query is made to a database, the query must not return void (no errors or timeouts) and the query must return in the shortest time possible.

Mostly, this points to reliability
#thread

That a database is reliable means users who query the database can access data in the shortest time irrespective of the load or strain on the database.

Also AVAILABILITY point to recoverability

That a database is recoverable means if an error occurs, it can be rectified quickly

Also recover-ability means that when a database server goes down, it can quickly be replaced or resuscitated.

When a database has very good AVAILABILITY, it is said to be HIGHLY AVAILABLE. When a database has not too good AVAILABILITY, it is said to be BASELY AVAILABLE

#thread

A HIGHLY AVAILABLE database cannot have STRONG CONSISTENCY (never possible). It can only have EVENTUAL CONSISTENCY as this type of CONSISTENCY is used to guarantee HIGH AVAILABILITY.

Now, lets go back to our earlier image:
#thread

Notice, the position of #MariaDB in relation to #MySQL and #PostgreSQL on the left side of the triangle for the CONSISTENCY - AVAILABILITY side. This positioning just shows that #MariaDB, #Oracle is EVENTUALLY CONSISTENT & HIGHLY AVAILABLE

#thread

While #MySQL and #PostgreSQL is IMMEDIATELY (STRONG) CONSISTENT & BASELY AVAILABLE.

Now, lets go back to our app/software example of a banking app or stock-trading software.

From the user features of such, you know it must IMMEDIATELY (STRONG) CONSISTENT...

#thread

So therefore, it must also be BASELY AVAILABLE. But can we tilt things in such a way that it can be IMMEDIATELY (STRONG) CONSISTENT & SEMI-HIGHLY AVAILABLE ?

Well yes! we can choose #MariaDB over #MySQL or #PostgreSQL

#thread

This choice is so because when we setup a replication cluster (master - slave) setup for #MariaDB, we can actually tune its EVENTUAL CONSISTENCY towards IMMEDIATE CONSISTENCY by enabling semi-synchronous replication as opposed to asynchronous replication

#Neo4j is actually another graph-based database that has HIGH AVAILABILITY & EVENTUAL CONSISTENCY, it is used for #OpenData and Semantic Web ( #RDF ) data storage and queries

#thread

For #MySQL, you can also use it to build a banking application like the one Nigerian banks use with IMMEDIATE (STRONG) CONSISTENCY & then make up for the BASE AVAILABILITY by setting up a #MySQL replication cluster aided by a write-around cache db (think MEMCACHE / REDIS)
#thread

However this #MySQL setup is hard to manage and sometimes coordination between the cache ( MEMCACHED / REDIS ) and the master - slave (replication cluster) setup means additional code at the application level to manage such complexity. At the end of the day, let simplicity rule!

Now, we move on to talk about PARTITION-TOLERANCE.

PARTITION-TOLERANCE simply means that when a database is broken up into parts (partitions), it's ability to function optimally for CONSISTENCY / AVAILABILITY doesn't get impacted.

#thread

These parts are usually called "shards" and are usually complex to manage however especially for SQL databases. shards are created when you take a monolith database (with its table) and break it up by table rows (horizontal partitioning) or by columns (vertical partitioning)

Now partitioning a database should not be confused with creating a replication cluster ( master / slave )/ They are not the same thing. Actually, you are advised to use partitioning as a very very very last resort after you must have employed replication.

#thread

So how does PARTITION-TOLERANCE relate to databases ? Well, SQL database are not easily partitioned. There's a lot of gymnastics involved in partitioning a database properly, choosing a hash function (in the case of a shard key) and selecting appropriate key namespaces.

#thread

In fact, you usually need to setup more code at the application level to do this. For NoSQL databases, it's basically the same story. This is why for most web/desktop software you'll build, you just need to concentrate on balancing between CONSISTENCY / AVAILABILITY

#thread

Simply concentrate on choosing based on the trade-offs between CONSISTENCY/AVAILABILITY and tune to suit the user features of the software you are building.

See the image below that gives an idea of how to chose a NoSQL database based on the scale and performance needed

#thread

Now, getting back to this our fav image (😀):

We see that NoSQL databases like #Cassandra have EVENTUAL CONSISTENCY/ HIGH AVAILABILITY but also easier to partition (actual partition - using partition keys) than it's SQL counterparts on the other side of the triangle.

#thread

Now, it's important to note that NoSQL databases are IMMEDIATELY (STRONG) CONSISTENT by default (when there are no replication/partitions) but become EVENTUALLY CONSISTENT when replication/partitions are introduced.

This little detail should be kept in mind

#thread

#MongoDB has HIGH AVAILABILITY & EVENTUAL CONSISTENCY only when using replication/partitioning. A single #MongoDB instance (like the way we build apps using the MEAN / MERN stack) is IMMEDIATELY CONSISTENT & BASELY AVAIALABLE.

So, if you were building a #BuyPowerNG for example..

You really don't need data to be IMMEDIATELY CONSISTENT but you need it to be HIGHLY AVAILABLE. So, a #PowerShopNG or #BuyPowerNG or #Paga will most likely be build with #MongoDB ngoDB over #PostgreSQL for instance.

#MongoDB is easier to scale than #PostegreSQL

#thread

You can also very much use #Cassandra database to build an app like #BuyPowerNG too. #Cassandra can also be used for building a #Twitter or a #Facebook too or other kinds of Analytics backend (e.g. the database that serves the Google Analytics dashboard)

#thread

Okay time for some interaction...

What database do you think an #InventoryManagement or #VendorProcurementsSoftware should be built with ?

Let me have your answers in the replies and let's have a discussion

Also, i do have another interaction...

What database will be best suited for an #ElectronicMedicalRecord software system ?

After the discussions on the 2 questions above, i'll wrap up by talking about key-value stores - REDIS / MEMCACHED

@threadreaderapp

@threadreaderapp

@threadreaderapp unroll

Like this thread? Get email updates or save it to PDF!

Subscribe to Okechukwu Ifeora

This content may be removed anytime!

Try unrolling a thread yourself!

Trending hashtags

Like this thread? Get email updates or save it to PDF!

Subscribe to Okechukwu Ifeora

This content may be removed anytime!

Try unrolling a thread yourself!

Related hashtags

More from @isocroft see all

Related threads

Trending hashtags

Did Thread Reader help you today?