Ankur Kumar 💫 Profile picture
A Techie, Blogger & Mentor | Shares Cloud Native, Microservices & Leadership Learnings | Loved Husband, Proud Dad | Founder of Vedcraft for Software Architects

Sep 8, 2024, 22 tweets

Read this thread to know about Lakehouse architecture pioneered by Databricks (keeping it an active thread to keep all the key architecture knowledge as simple read) 🧵
(image source: Databricks)

#Databricks #Lakehouse #DeltaLake

Databricks along with UC Berkley & Stanford published this whitepaper in 2021 to revolutionize the DWH with Lakehouse architecture -> "Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics"
people.eecs.berkeley.edu/~matei/papers/…

Databricks as a pioneer for unifying the warehouse and data lake as a Lakehouse platform, eliminating the complexity with foundation as delta lake.

Image: Databricks

Delta Lake is a file-based opensource storage format (the empowering technology behind Databricks)
Image: Databricks

Another major technology behind Databricks is Photon - which is a query execution engine for both structured and unstructured data. It has much better performance in comparison.

Image: Databricks

Image: Databricks

Unity Catalog is the unified governance solution structure behind the Databricks platform. It is now opensource.

github.com/unitycatalog/u…

Delta Sharing - An open standard for secure data sharing - contributed to the Opensource (part of the Linux Foundation)/ It provides open cross-platform sharing, share live data without copy, centralize admin & governance, and data privacy.

delta.io

Databricks Security Architecture is managed using the standard control plan and data plane architecture:
(Image: Databricks)

Databricks Lakehouse architecture is divided into the control plane and data plane, also offers serverless data plane.

Databricks Unity Catalog is the core behind the centralized governance

(Image: Databricks)

Catalog is the topmost element in the Unity catalog:
(Image: Databricks)

Unity catalog uses 3-level access mechanism:

Databricks Lakehouse Platform for Data Warehousing:
(Image: Databricks)

Databricks introduced Medallion Architecture (a data design pattern used to logically organize data in a lakehouse, with the goal of incrementally and progressively improving the structure and quality of data)

databricks.com/glossary/medal…

Delta Live Tables (DLT) - an ETL framework to build data pipelines in a declarative way (supports entire pipeline in Python & SQL).

databricks.com/product/delta-…

Databricks Workflows (an unified orchestration for data, analytics and AI on the Data Intelligence Platform) - fully managed orchestration service embedded in Lakehouse platform. It supports building orchestration flow in DLT, DBT, and other solutions.

databricks.com/product/workfl…

Reference Architecture for Streaming Use cases on Databricks:
docs.databricks.com/en/lakehouse-a…

Building Gen AI Applications on Databricks using Databricks Lakehouse AI:


(Image Source: Databricks) databricks.com/blog/lakehouse…

The Databricks Data Intelligence Platform is a unified, open analytics platform designed for building, deploying, sharing, and maintaining enterprise-grade data, analytics, and AI solutions at scale.

Databricks has a well-architected framework for the lakehouse

Source: docs.databricks.com/en/lakehouse-a…

@threadreaderapp unroll

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling