Fernando 🇮🇹🇨🇭 Profile picture
Aug 12 10 tweets 2 min read Twitter logo Read on Twitter
Many applications requires to generate unique IDs in their backend.

This is an easy task in a single server, but it's harder at large-scale.

Here are 3 effective strategies you can use:

{1/9} ↓ Image
Generating unique identifiers isn't difficult in a single server.

For example you can use an incremental ID or a function getting the time of the day.

These strategies fail in a distributed scenario, since they create duplicates.

{2/9}
1. Application level Universal Unique Identifiers (UUIDs).

UUIDs are 128-bit numbers that the application can generate in different ways.

For example, it can combine time, the server’s MAC address, or an MD5 hash.

{3/9}
The main benefits of UUIDs are:

• servers don't need to be in sync
• large ID space and low number of collisions

The main drawbacks are:

• big size (128 bit)
• IDs are not sequential

{4/9}
2. Database level UUIDs.

Many databases provide an auto increment feature.

So a database server can be used to generate unique IDs.

This approach is also known as ticket service and has been used by Flickr.

{5/9}
The main benefits of database generated IDs are:

• the application code gets simpler

• IDs are sequential and short in size

The main drawbacks are:

• the additional round trip to get the IDs from the database

• the database becomes a single point of failures

{6/9}
3. Snowflake ID generation service.

This form of unique IDs has been used at Twitter, and Instagram.

The idea is to generate the IDs composing multiple fields:

• 41-bit timestamp
• 10-bit worker ID
• 12-bit sequence number
• 1-bit reserved for future usage

{7/9}
Some observations:

• the timestamp with ms resolution can work for 70 years after a starting epoch

• a coordination service (Zookeeper) assigns the worker ID considering both data center and server ID

• the sequence number support up to 4096 unique IDs per ms

{8/9}
Such a way of generating IDs has many advantages:

• it's available since can be implemented by 1024 machines

• it's scalable since each machine can generate 4096 IDs per ms

• IDs are time sortable since the timestamp is in the higher-order bits

{9/9}
Every week I write about algorithms, distributed systems and software engineering.

If you liked this thread, you can check my newsletter as well.

Your support encourages me to keep writing!

polymathicengineer.com

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Fernando 🇮🇹🇨🇭

Fernando 🇮🇹🇨🇭 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Franc0Fernand0

Aug 9
Data structures knowledge is crucial to build high quality and performant code.

It makes the difference between good and bad software developers.

Here are the 7 fundamental data structures everyone must-know: ↓ Image
1. Arrays

• Primitive data structure

• Values are all of the same type

• Linear data structure, fixed in size

• Values are stored consecutively in memory

• Building block for many other data structures

• Dynamic arrays resize themselves when necessary
2. Linked lists

• Linear collection of nodes

• Each node contain values and a pointer to the next node

• Linear data structure, dynamic in size

• Building block for other data structures

• Double Linked lists contain also a pointer to the previous node
Read 9 tweets
Aug 5
DynamoDB is a fully managed NoSQL database available on AWS.

It is widespread and powers platforms like Netflix and Zoom.

Here is how DynamoDB works, when to use it, and when not:

{1/9} ↓ Image
DynamoDB is a DBaaS providing APIs to insert items, create tables, scan table, etc.

It manages everything, allowing developers to focus only on the application logic.

Data encryption, failure recovery, upgrades are all handled by the database.

{2/9}
DynamoDB builds upon the learnings from two Amazon's earlier projects:

- Dynamo: a highly scalable, key-value database
- SimpleDB: a fully managed NoSQL database service

However, both had limitations that DynamoDB addresses.

{3/9}
Read 10 tweets
Jul 29
Scaling up a cluster of database servers is easier with read replicas.

But the more replicas you use, the more you must deal with the replication lag.

Here 3 approaches every backend engineer should know:

{1/9} ↓ Image
A practical setup for read-heavy workflows is to use:

• a primary database for write operations

• read replicas

But this has a drawback: the replication lag.

The replicas are a few sec/min behind the primary server and can send back stale data.

{2/9}
All this creates consistency problems leading to unpredictability at the application level.

There are several possible approaches to solving those consistency issues:

• Tight Consistency

• Causal Consistency

• Monotonic Read Consistency

{3/9}
Read 10 tweets
Jul 26
Even if you don't use linked lists daily, they are common in coding interview questions.

To solve those problems, you need to be good at manipulating pointers.

Here is how to solve 99% of the problems with the fast and slow pointers method:

{1/5} ↓ Image
Fast and slow pointers are an extension of the arrays 2 pointers method to linked list.

The idea is to iterate through the list with 2 pointers moving at different speeds.

The pointers are usually called fast and slow.

{2/5}
Let's consider an example where we want to find the node in the middle of a list.

The brute force approach would first find the list's length and then the middle node.

Fast and slow pointers give a more efficient and elegant solution with only one iteration.

{3/5}
Read 6 tweets
Jul 19
Sorting methods are the "hello world" of algorithms.

Here are 8 algorithms to sort arrays everyone should know: ↓ Image
1. Bubble Sort

• Compare 2 adjacent items. If the first is bigger than the second, swap them.

• At the end of each cycle, the biggest item on the array moves to the end.

• The next cycle through the array leave off the last positioned biggest item.
2. Selection Sort

• Iterate over the array to find the smallest item.

• At the end of each cycle, this smallest item is swapped with the first one.

• The next cycle excludes the last positioned smallest item .
Read 10 tweets
Jul 15
Tree data structures are used in many real-world scenarios.

Here are 4 important tree-structures and their significant use cases:

{1/9} ↓ Image
1. Binary Search Trees (BST)

A constrained extension of a binary tree with a unique property.

Given a node, the value of its left child is less than or equal to the parent value.

The value of its right child is greater than or equal to the parent value.
The main use case of BSTs are:

- implement simple sorting algorithms

- maintain a sorted stream of data

- implement double ended priority queues

The most popular BSTs implementation are balanced trees like Red Black or AVL.
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(