Profile picture
tomaka @tomaka17
, 11 tweets, 3 min read Read on Twitter
Small thread about a #rustlang #libp2p design mistake I made.

When you write, for example an HTTP server, each connection is handled on its own, and all of them are handled concurrently.

If multiple connections need mutable access to something, you lock a mutex.
Using a mutex is not a problem, because you don't care in which order requests in multiple different connections are being handled, and therefore you don't care whether connection A locks the mutex first or connection B.

But in a peer to peer environment, it's different.
In peer to peer systems, you want to handle the network as a whole. When you receive a pubsub message from a node, you want to relay it to the others.
When you receive a peer discovery query, you need to return the nodes you're connected to, and so on.
The initial design of libp2p was similar to that of an HTTP server: things processed in parallel locking mutexes.

This turned out to be a mess, as you had no clue in which order things were happening. That message being processed may be from a node we're no longer connected to.
And this added a lot of corner cases that needed to be handled.

The worst was if we accidentally connected twice to the same node, in which case we'd close the first connection.

But then parts would try to redirect to the 2nd connection, and you can imagine how complex that is.
Additionally, another problem was that for example a future would process the messages on a socket, and when a certain message would arrive, it would send something to a channel, whose receiving side is also a future.

The problem is that polling the rx alone doesn't do anything.
In order to receive something on that rx, you'd have to rely on the fact that another part of the code would poll the future that polls the socket.

This further complexified the whole logic of the code.

Instead, the way libp2p now works is different.
There is a struct named Swarm which implements Stream and yields events about the network one by one.

More importantly, methods on the Swarm can be used to get the state of the network, and this state is updated only when polling.
For example if you call is_connected(peer), it would return true even if the background task that handles the node is actually already closed.

It's only after you poll() and obtain a PeerDisconnected event that is_connected would stary returning false.
In other words, the whole network is now a single state machine, and not a giant spaghetti of events being propagated in parallel.

This made everything waaaay easier, as it eliminated all potential race conditions, and I'm confident that code that uses libp2p can now be robust.
Thanks for reading. 👍
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to tomaka
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!