Tweet

Arpit Bhayani

15 Aug, 11 tweets, 2 min read

Ever wonder how replication happens between Master and Replica? How changes on Master propagates to Replica?

This is a short thread of how it happens

#systemdesign #distributedsystems

🧵

Any write operation happening on the Master is logged in the Replication log file as an event. The format in which these events are logged in the Log file is called Replication Format.

The two common Replication formats:

- Statement-based format
- Row-based format

✨ Statement-based Format

The Master records the operation as an event in its log, and when the Replica reads this log, it executes the same operation on its copy of data.

This way, the operation on the Master is executed on the Replica, which keeps it in sync with the Master.

UPDATE tasks SET is_done = true WHERE user_id = 53;

is logged as

UPDATE tasks SET is_done = true WHERE user_id = 53;

👉 Advantages of Statement-based Replication:

- Smaller log files
- Log files can be used to audit the database

👉 Disadvantages of Statement-based Replication:

- Non-deterministic operations like RAND(), UUID(), will yield different values on Master and Replica
- The Replica lag depends on the load and concurrent queries executing during replication.

✨ Row-based Format

The Master logs the updates on the individual data item instead of the operation.

When the Replica reads this log, it updates its copy of the data by applying the changes on its data items. This way the Replica remains in sync with the Master.

UPDATE tasks SET is_done = true WHERE user_id = 53;

is logged as

tasks:121 is_done=true
tasks:142 is_done=true
tasks:643 is_done=true
tasks:713 is_done=true
tasks:862 is_done=true

👉 Advantages

- changes can be safely and predictably applied on the Replica
- locks are fewer and shorter

👉 Disadvantages
- If an operation affects 5000 rows, the Master would create 5000 entries in the log file
- longer lock taken during logging affects the throughput

To date, I have written ~60 articles on Distributed Systems, System Design, Advanced Algorithms, and Python Internals.

Right now, I am running a series on Distributed Systems.

1800+ people have subscribed to my newsletter. Join them at arpitbhayani.me/newsletter.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @arpit_bhayani

Arpit Bhayani

@arpit_bhayani

12 Aug

Just wrapped up my 1:1 call with one of my cohort-ian and we ended up building an infinitely scalable Distributed Task Scheduler, AWS CloudWatch Events, DKron, and Quartz Scheduler, in under 30 minutes.

When foundations are clear, no system is harder to design 💪

#systemdesign

The features we discussed and designed were:

- Infinite task ingestion
- 30 second SLA of execution
- Execution Framework that supports Binaries, Scripts, Remote Executions
- Fault tolerance of Scheduler Nodes
- Repeatability of tasks
- Exactly-once schedule and execution

The design we discussed did not just have random boxes of high-funda components but rather the actual tools and techs that we would be using, along with their pros and limitations. 💪

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Arpit Bhayani

Try unrolling a thread yourself!

More from @arpit_bhayani

Arpit Bhayani

Did Thread Reader help you today?

Like this author's thread?