Continuing my trend of writing about search and search engines, one question I've had is why aren't we making better use of the databases people are already storing their data in? #tweet100 #programming #search 🧵
Every search engine I know of builds their own index and requires you to take all of your data from your existing database and move it into the search engine. 1/
Most search engines are based on inverted indexes which allows for very efficient word look up within an indexed document, but many databases have long since included either the ability to create an inverted index or similar indexes that are good enough for most use cases… 2/
…such as trigram indexes. Plus there is a whole bunch of reranking and processing that occurs afterwards which in my experience can contribute more query latency than those lookups. 3/
Looking at general purpose relational and NoSQL databases we can see that MongoDB's Atlas cloud offering includes an inverted index and builds in text analyzers similar to Elasticsearch. 4/
They're not as mature as Elastic but they do allow for a more than good enough text search in many use cases. Postgres provides trigram and GIN (general inverted index) indexes and MySQL InnoDB also has an inverted index for full text search. 5/
While I've yet to sit down and do a full time comparison, my suspicion is that Lucene will be a bit faster any time it needs to hit the inverted index by virtue of being highly optimized specifically for that use case, but the question as engineers we need to ask is by how… 6/
…large a margin and how much of a difference does it make in the business domain we're building for. 7/
How important is that minimal matching difference if we can build an easier to use search engine that leverages existing db technology and builds the result processing and analysis pipelines on top. 8/
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.
