I have lately received a number of messages, asking about the security of IOTA's new consensus mechanism in situations like network splits.
Since these questions seem to originate in factually wrong statements of a critic, I want to answer this question publicly.
(1/20)🧵👇
To understand how IOTA handles this type of situation, we first need to understand what a network split is.
It is a situation where the network is split into two (or more) disconnected partitions where each partition can only see their respective set of issued messages.
(2/20)
Most splits are the result of faulty network infrastructure causing temporary interruptions of connectivity.
Redundant hardware and connections have made large-scale network splits increasingly rare but smaller, locally confined partitions are still relatively common.
(3/20)
Another (less common) reason for partitions are eclipse attacks, where an attacker manages to hijack all connections of a victim.
It is clear that disconnected nodes are unable to reach consensus, which raises the question how nodes should behave in such a situation.
(4/20)
A well known theorem that discusses the impact of partitions on distributed systems is the CAP theorem.
Applied to the context of consensus, it states that we have to choose between:
- liveness (partitions make progress but diverge) or
- safety (the network halts).
(5/20)
Protocols that favor liveness, choose a single winning ledger once the partitions merge and provide probabilistic finality.
Protocols that favor safety, provide deterministic finality.
(6/20)
From a user point of view, we obviously would like to have a protocol, that favors safety since we want transactions to be final as fast as possible, without having to be worried about possible rollbacks.
Protocols that favor safety do however have a big problem, ...
(7/20)
... in the open and permissionless setting:
Nodes can detect that actors stop sending statements, but they can not distinguish between partitions and nodes going offline.
This means, that the network would halt if too many nodes disappear, requiring a manual restart.
(8/20)
This challenges a core value proposition of DLTs (their robustness and fault tolerance).
Ideally we want a protocol that never stops but that also never confirms something that doesn't end up being final, which (seemingly) violates the CAP theorem.
(9/20)
This problem is called the "availability-finality-dilemma" and it can be solved surprisingly simple:
Instead of running 1 protocol, we run 2 protocols:
- one that favors liveness
- and one that favors safety (which tries to confirm the "live" ledger state)
(10/20)
These kind of hybrid protocols are called ebb-and-flow protocols (arxiv.org/abs/2009.04987) and there are a handful of projects that use such a "finality gadget" on top of a live ledger (i.e. ETH 2.0, Polkadot and NEAR).
IOTA will be the first DAG that uses these ideas.
(11/20)
In IOTA, a transaction is:
accepted: once it was referenced by 2/3+1 of the online committee (active in the last 10 seconds)
confirmed: once it was accepted and also referenced by 2/3+1 of the average committee weight over the last N epochs.
(12/20)
The parameter N defines a time frame that is equivalent to ETH 2.0s "inactivity leak", which allows the network to automatically recover from extreme situations, like a large amount of nodes going offline (i.e. due to catastrophies, war or governmental intervention).
(13/20)
Since IOTA plans to use social consensus as a protection for long range attacks, we don't need to slash offline nodes.
Instead we simply reduce the threshold that is necessary for confirmations over time, which allows the network to automatically recover.
(14/20)
Similar to BTCs confirmation threshold, nodes can individually choose their N to define how fast they want to recover confirmations.
Exchanges will most probably choose similar parameters as ETH 2.0 (multiple weeks with the option to manually intervene in times of war).
(15/20)
This means, that while acceptations will always continue (even when being eclipsed), confirmations will halt until the partition is either resolved, or the time period for the recovery is reached, which puts the user on the safe side.
It is important to note, that ...
(16/20)
... the ability to see statements of trusted actors adds another layer of protection against eclipse attacks.
I could for example only resume confirmations automatically if I also see at least a minimum amount of activity by trusted actors.
(17/20)
Keeping the network alive doesn't just allows user to i.e. re-stake their funds with online validators to recover confirmations faster but it also provides at least a minimum amount of functionality even in very adverse situations.
(18/20)
TL;DR: If you are eclipsed, then acceptations continue while confirmations will halt.
If N is set to -1 then the confirmation threshold is in respect to the maximum committee weight ever observed, which is equivalent to traditional deterministic finality (which halts).
(19/20)
Since the node optimistically advances the ledger on acceptance, we need to support rollbacks of state (which was merged last week), and which forms the basis for the last missing piece (chain switching), that allows nodes to automatically recover from partitions.
(20/20)
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.
