Tweet

Alex Xu

Dec 19 • 11 tweets • 3 min read

/1 Data is cached everywhere, from the front end to the back end!

This diagram illustrates where we cache data in a typical architecture.

/2 There are 𝐦𝐮𝐥𝐭𝐢𝐩𝐥𝐞 𝐥𝐚𝐲𝐞𝐫𝐬 along the flow.

🔹 1. Client apps: HTTP responses can be cached by the browser. We request data over HTTP for the first time; we request data again, and the client app tries to retrieve the data from the browser cache first.

/3

🔹 2. CDN: CDN caches static web resources. The clients can retrieve data from a CDN node nearby.

🔹 3. Load Balancer: The load Balancer can cache resources as well.

/4
🔹 4. Messaging infra: Message brokers store messages on disk first, and then consumers retrieve them at their own pace. Depending on the retention policy, the data is cached in Kafka clusters for a period of time.

/5
🔹 5. Services: There are multiple layers of cache in a service. If the data is not cached in CPU cache, the service will try to retrieve the data from memory. Sometimes the service has a second-level cache to store data on disk.

/6
🔹 6. Distributed Cache: Distributed cache like Redis hold key-value pairs for multiple services in memory. It provides much better read/write performance than the database.

/7
🔹 7. Full-text Search: we sometimes need to use full-text searches like Elastic Search for document search or log search. A copy of data is indexed in the search engine as well.

/8
🔹 8. Database: Even in the database, we have different levels of caches:

- WAL(Write-ahead Log): data is written to WAL first before building the B tree index
- Bufferpool: A memory area allocated to cache query results
- Materialized View

/9

- Transaction log: record all the transactions and database updates
- Replication Log: used to record the replication state in a database cluster

/10 👉 Over to you: With the data cached at so many levels, how can we guarantee the 𝐬𝐞𝐧𝐬𝐢𝐭𝐢𝐯𝐞 𝐮𝐬𝐞𝐫 𝐝𝐚𝐭𝐚 is completely erased from the systems?

Subscribe to our weekly newsletter to learn something new every week: bit.ly/3FEGliw

@alexxubyte

/11 I hope you've found this thread helpful.

Follow me @alexxubyte for more.

Like/Retweet the first tweet below if you can:

https://twitter.com/alexxubyte/status/1604880509086994436

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @alexxubyte

Alex Xu

@alexxubyte

Dec 15

Popular interview question: What is the difference between 𝐏𝐫𝐨𝐜𝐞𝐬𝐬 and 𝐓𝐡𝐫𝐞𝐚𝐝?

Watch and subscribe here (YouTube video): lnkd.in/ef9DSDHV

Main differences between process and thread:

🔹 Processes are usually independent, while threads exist as process subsets.
🔹 Each process has its own memory space. Threads that belong to the same process share the same memory.

🔹 A process is a heavyweight operation. It takes more time to create and terminate.
🔹 Context switching is more expensive between processes.
🔹 Inter-thread communication is faster for threads.

Read 4 tweets

Alex Xu

@alexxubyte

Dec 14

/1 What are the most common misconceptions about distributed environments?

About 30 years ago, Peter Deutsch drafted a list of eight fallacies in distributed computing environments, now known as "The 8 fallacies of distributed computing". Many years later, the fallacies remain.

/2 🔹The network is reliable
🔹Latency is zero
🔹Bandwidth is infinite
🔹The network is secure
🔹Topology doesn't change
🔹There is one administrator
🔹Transport cost is zero
🔹The network is homogeneous.

/3 Subscribe to our weekly newsletter to learn something new every week:

bit.ly/3FEGliw

Read 4 tweets

Alex Xu

@alexxubyte

Dec 13

/1 ChatGPT and copy. ai brought attention to AIGC (AI-generated Content). Why is AIGC gaining explosive growth?

The diagram below summarizes the development in this area.

OpenAI has been developing GPT (Generative Pre-Train) since 2018.

/2 GPT 1 was trained with BooksCorpus dataset (5GB), whose main focus is language understanding.

On Valentine’s Day 2019, GPT 2 was released with the slogan “too dangerous to release”. It was trained with Reddit articles with over 3 likes (40GB). The training cost is $43k.

/3 Later GPT 2 was used to generate music in MuseNet and JukeBox.

Read 7 tweets

Alex Xu

@alexxubyte

Dec 12

/1 Our newsletter ByteByteGo just reached an important milestone, and I wanted to share some of the learnings in this journey.

/2 How did we get here?
Before posting anything about system design on social media, I spent 2.5 years writing 2 system design interview books. Writing a good book is incredibly hard and usually not very rewarding, but this turned out to be my best investment.

/3 It taught me 3 things: 1) How to write technical content people like to read, 2) Good work takes time. Don’t rush it. 3) Follow your intuition.

Read 11 tweets

Alex Xu

@alexxubyte

Dec 7

/1 How do you decide which type of database to use?

There are hundreds or even thousands of databases available today, such as Oracle, MySQL, MariaDB, SQLite, PostgreSQL, Redis, ClickHouse, MongoDB, S3, Ceph, etc. How do you select the architecture for your system?

/2 My short summary is as follows:
🔹Relational database. Almost anything could be solved by them.
🔹In-memory store. Their speed and limited data size make them ideal for fast operations.
🔹Time-series database. Store and manage time-stamped data.

/3 🔹Graph database. It is suitable for complex relationships between unstructured objects.
🔹Document store. They are good for large immutable data.
🔹Wide column store. They are usually used for big data, analytics, reporting, etc., which needs denormalized data.

Read 7 tweets

Alex Xu

@alexxubyte

Dec 6

/1 𝐇𝐨𝐰 𝐝𝐨 𝐰𝐞 𝐥𝐞𝐚𝐫𝐧 𝐄𝐥𝐚𝐬𝐭𝐢𝐜𝐒𝐞𝐚𝐫𝐜𝐡?

Based on the Lucene library, Elasticsearch provides search capabilities. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface. The diagram below shows the outline.

/2 Features of ElasticSearch:
🔹 Real-time full-text search
🔹 Analytics engine
🔹 Distributed Lucene

ElasticSearch use cases:
🔹 Product search on an eCommerce website
🔹 Log analysis
🔹 Auto completer, spell checker
🔹 Business intelligence analysis
🔹 Full-text search

/3 🔹 Full-text search on StackOverflow

The core of ElasticSearch lies in the data structure and indexing. It is important to understand how ES builds the 𝐭𝐞𝐫𝐦 𝐝𝐢𝐜𝐭𝐢𝐨𝐧𝐚𝐫𝐲 using 𝐋𝐒𝐌 𝐓𝐫𝐞𝐞 (Log-Strucutured Merge Tree).

Read 5 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Alex Xu

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @alexxubyte

Alex Xu

Alex Xu

Alex Xu

Alex Xu

Alex Xu

Alex Xu

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!