A primer:
neo4j.com/blog/acid-vs-b…
We're starting with the premise: text is important, and text is a graph
- Map synonyms to a single representation (see wordnet synsets)
- Map the connection between cause and effect (common approach: NLP classification models)
- Parse syntactic structure (see part of speech tagging)
Each complaint has a :FIRST_WORD relationship to the first word of text, which is connected to the next node by an association called :NEXT_WORD, plus other associations describing lexical relationships.
Luckily, it's quite legible because #Cypher is badass
I wonder what the model is like. Does it predict a probability that each word is part of the complaint, the cause, the correction?
We're on the second picture of some dude smoking, haha
From the text, the model focuses on a small collection of part of speech features. Looks like the direct object and first adjective map to...some annotation
I desperately hope he is setting up a punny intro to cypher pitfalls: "snakes in the grass when snaking the graph"
Maybe in the future they can switch the order of this talk and the previous talk for the benefit of those folks.
The point, I suppose, is that graphs make it possible to get the "important" data out of the lake. IME that's usually about 1 bucketful.
Currently I'm playing with that in a db of my own. Instead of an Investment node, I have relationships between Companies and Accounts that have a start date, an end date, and a share count.
- Pathfinding and Search
- Centrality
- Community Detection
Others: Closeness (most connected nodes) Betweenness (nodes bridging groups of nodes), and Degree (most popular nodes)
Label Propagation: labeling nodes based on neighbors to infer clusters
Union Find/Weakly Connected Components: Finds nodes that all have a path to each other
Strongly Connected Components: Same as weakly, but with associations running both directions.
Triangle Count and Clustering Coefficient: Which sets of three nodes all connect to one another? How does that compare to node triplets with 2 or fewer connections?
It's with the Game of Thrones database. What am I missing that every data scientist knows about Game of Thrones?
Here's more info on what we're demoing.
neo4j.com/blog/efficient…
neo4j-contrib.github.io/neo4j-graph-al…
There are also 3 or 4 sandboxes available online to play with these!
Quote: "People said RSI stood for repetitive stress injury" 😂
- AI & Graph Analytics
- Transactional Graphs
- Discovery and Visualization (the news here is Bloom, Neo4j's new product)
Quote: "I normally tell you not to believe vendor-provided performance benchmarks, so take this with a grain of salt." 🤣
He recorded a video and he'll be talking over it. I like this a lot. Error-free, speed under speaker's control, speaker can even back it up. Particularly if there's no interactive component to a conference talk, this is the way to demo.
Here's more info from the docs (I'm seeing an assumption of Java and annotation availability in these docs, but I'll look for more on the other drivers later)
neo4j.com/developer/proc…
neo4j.com/developer/elas…
Evidently there's a great talk from @Airbnb on this, but I can't find it! Help?
This is available via graph.versioner in APOC. You don't have to manually dupe nodes as you advance in versions!
github.com/graphaware/neo…