Alex Xu Profile picture
Dec 19 โ€ข 11 tweets โ€ข 3 min read
/1 Data is cached everywhere, from the front end to the back end!

This diagram illustrates where we cache data in a typical architecture.
/2 There are ๐ฆ๐ฎ๐ฅ๐ญ๐ข๐ฉ๐ฅ๐ž ๐ฅ๐š๐ฒ๐ž๐ซ๐ฌ along the flow.

๐Ÿ”น 1. Client apps: HTTP responses can be cached by the browser. We request data over HTTP for the first time; we request data again, and the client app tries to retrieve the data from the browser cache first.
/3

๐Ÿ”น 2. CDN: CDN caches static web resources. The clients can retrieve data from a CDN node nearby.

๐Ÿ”น 3. Load Balancer: The load Balancer can cache resources as well.
/4
๐Ÿ”น 4. Messaging infra: Message brokers store messages on disk first, and then consumers retrieve them at their own pace. Depending on the retention policy, the data is cached in Kafka clusters for a period of time.
/5
๐Ÿ”น 5. Services: There are multiple layers of cache in a service. If the data is not cached in CPU cache, the service will try to retrieve the data from memory. Sometimes the service has a second-level cache to store data on disk.
/6
๐Ÿ”น 6. Distributed Cache: Distributed cache like Redis hold key-value pairs for multiple services in memory. It provides much better read/write performance than the database.
/7
๐Ÿ”น 7. Full-text Search: we sometimes need to use full-text searches like Elastic Search for document search or log search. A copy of data is indexed in the search engine as well.
/8
๐Ÿ”น 8. Database: Even in the database, we have different levels of caches:

- WAL(Write-ahead Log): data is written to WAL first before building the B tree index
- Bufferpool: A memory area allocated to cache query results
- Materialized View
/9

- Transaction log: record all the transactions and database updates
- Replication Log: used to record the replication state in a database cluster
/10 ๐Ÿ‘‰ Over to you: With the data cached at so many levels, how can we guarantee the ๐ฌ๐ž๐ง๐ฌ๐ข๐ญ๐ข๐ฏ๐ž ๐ฎ๐ฌ๐ž๐ซ ๐๐š๐ญ๐š is completely erased from the systems?

Subscribe to our weekly newsletter to learn something new every week: bit.ly/3FEGliw
/11 I hope you've found this thread helpful.

Follow me @alexxubyte for more.

Like/Retweet the first tweet below if you can:

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with Alex Xu

Alex Xu Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @alexxubyte

Dec 15
Popular interview question: What is the difference between ๐๐ซ๐จ๐œ๐ž๐ฌ๐ฌ and ๐“๐ก๐ซ๐ž๐š๐?

Watch and subscribe here (YouTube video): lnkd.in/ef9DSDHV Image
Main differences between process and thread:

๐Ÿ”น Processes are usually independent, while threads exist as process subsets.
๐Ÿ”น Each process has its own memory space. Threads that belong to the same process share the same memory.
๐Ÿ”น A process is a heavyweight operation. It takes more time to create and terminate.
๐Ÿ”น Context switching is more expensive between processes.
๐Ÿ”น Inter-thread communication is faster for threads.
Read 4 tweets
Dec 14
/1 What are the most common misconceptions about distributed environments?

About 30 years ago, Peter Deutsch drafted a list of eight fallacies in distributed computing environments, now known as "The 8 fallacies of distributed computing". Many years later, the fallacies remain.
/2 ๐Ÿ”นThe network is reliable
๐Ÿ”นLatency is zero
๐Ÿ”นBandwidth is infinite
๐Ÿ”นThe network is secure
๐Ÿ”นTopology doesn't change
๐Ÿ”นThere is one administrator
๐Ÿ”นTransport cost is zero
๐Ÿ”นThe network is homogeneous.
/3 Subscribe to our weekly newsletter to learn something new every week:

bit.ly/3FEGliw
Read 4 tweets
Dec 13
/1 ChatGPT and copy. ai brought attention to AIGC (AI-generated Content). Why is AIGC gaining explosive growth?

The diagram below summarizes the development in this area.

OpenAI has been developing GPT (Generative Pre-Train) since 2018.
/2 GPT 1 was trained with BooksCorpus dataset (5GB), whose main focus is language understanding.

On Valentineโ€™s Day 2019, GPT 2 was released with the slogan โ€œtoo dangerous to releaseโ€. It was trained with Reddit articles with over 3 likes (40GB). The training cost is $43k.
/3 Later GPT 2 was used to generate music in MuseNet and JukeBox.
Read 7 tweets
Dec 12
/1 Our newsletter ByteByteGo just reached an important milestone, and I wanted to share some of the learnings in this journey. Image
/2 How did we get here?
Before posting anything about system design on social media, I spent 2.5 years writing 2 system design interview books. Writing a good book is incredibly hard and usually not very rewarding, but this turned out to be my best investment.
/3 It taught me 3 things: 1) How to write technical content people like to read, 2) Good work takes time. Donโ€™t rush it. 3) Follow your intuition.
Read 11 tweets
Dec 7
/1 How do you decide which type of database to use?

There are hundreds or even thousands of databases available today, such as Oracle, MySQL, MariaDB, SQLite, PostgreSQL, Redis, ClickHouse, MongoDB, S3, Ceph, etc. How do you select the architecture for your system?
/2 My short summary is as follows:
๐Ÿ”นRelational database. Almost anything could be solved by them.
๐Ÿ”นIn-memory store. Their speed and limited data size make them ideal for fast operations.
๐Ÿ”นTime-series database. Store and manage time-stamped data.
/3 ๐Ÿ”นGraph database. It is suitable for complex relationships between unstructured objects.
๐Ÿ”นDocument store. They are good for large immutable data.
๐Ÿ”นWide column store. They are usually used for big data, analytics, reporting, etc., which needs denormalized data.
Read 7 tweets
Dec 6
/1 ๐‡๐จ๐ฐ ๐๐จ ๐ฐ๐ž ๐ฅ๐ž๐š๐ซ๐ง ๐„๐ฅ๐š๐ฌ๐ญ๐ข๐œ๐’๐ž๐š๐ซ๐œ๐ก?

Based on the Lucene library, Elasticsearch provides search capabilities. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface. The diagram below shows the outline.
/2 Features of ElasticSearch:
๐Ÿ”น Real-time full-text search
๐Ÿ”น Analytics engine
๐Ÿ”น Distributed Lucene

ElasticSearch use cases:
๐Ÿ”น Product search on an eCommerce website
๐Ÿ”น Log analysis
๐Ÿ”น Auto completer, spell checker
๐Ÿ”น Business intelligence analysis
๐Ÿ”น Full-text search
/3 ๐Ÿ”น Full-text search on StackOverflow

The core of ElasticSearch lies in the data structure and indexing. It is important to understand how ES builds the ๐ญ๐ž๐ซ๐ฆ ๐๐ข๐œ๐ญ๐ข๐จ๐ง๐š๐ซ๐ฒ using ๐‹๐’๐Œ ๐“๐ซ๐ž๐ž (Log-Strucutured Merge Tree).
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(