Tweet

Chris Gioran

19 Mar, 15 tweets, 3 min read

1/15 Neo4j has always assigned every node (and relationship, but let's focus on nodes) a long value that's called the node's id. Multiply that by the node record size and you know the offset in the node store.

2/15 This id functions as an always present, automatically assigned psedo-property, which allows for O(1) lookups and is also used as the value in our indexes. It's very useful as an addressing mechanism.

3/15 It also has some invariant guarantees, the most interesting of which is that within a transaction, a node id is never reassigned. This, in combination with the above, makes it tempting to use as node identity.

4/15 However, id and identity are not exactly the same, id being merely a shorthand for identity. For example, a node may be deleted and its id reused for a completely different node.

5/15 Even more interestingly, a node may not be deleted, but have all its relationships and properties changed and its position in the graph completely modified. From an id perspective it's still the same node...

6/15 ...but functionally you probably wouldn't consider the two versions "identical". There is a "ship of Theseus" underlying problem here, since a node can be gradually changed, by manipulating properies and relationships...

7/15 ...so that the start and end states are obviously not identical but there is no clear point where the identity changed. It's not directly related though, and it's always part of discussions about identity anyway. Another interesting parallel is with the relational model.

8/15 There, the answer is already solved by design - no two tuples are ever the same, at least in 1NF. Therefore, identity is the content. Any key on top of that is a shorthand for an already established identity.

9/15 Back to graphs, things are more complicated. It is completely valid, from a model perspective, to have a graph with just two unconnected, empty nodes. But even in this case, the nodes may be equal, but they are not identical.

10/15 Even though you can't write any query to tell them apart, their distinctness can't be dismissed. Effectively, #Neo4j provides a mechanism that breaks that tie, and always allows the user to tell two nodes apart.

11/15 Node ids guarantee distinctness, but since they are bound to physical addresses they can be reused. I think what Neo4j offers today, although it appears naive, works well as a compromise.

12/15 But, as #Cypher gains capabilities and Fabric becomes better established, we are coming up the restrictions of the current model. Ids today are unique only within a database so graph unions become tricky.

13/15 Snapshots and in general temporal references also become increasingly important. Schema can definitely help, and it seems that any good solution will include some form of model constraint.

14/15 But we must also make sure that it works as an algebra with sharded/distributed graphs and other constraints.

15/15 Overall, the problem is extremely interesting. It's one of those that should be easy to solve (just use a long, right?) but if you start looking at actual usage, it becomes quite complex.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Chris Gioran

Try unrolling a thread yourself!

Did Thread Reader help you today?

Like this author's thread?