Read this thread to know about Lakehouse architecture pioneered by Databricks (keeping it an active thread to keep all the key architecture knowledge as simple read) 🧵
(image source: Databricks)
#Databricks #Lakehouse #DeltaLake
Databricks along with UC Berkley & Stanford published this whitepaper in 2021 to revolutionize the DWH with Lakehouse architecture -> "Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics" people.eecs.berkeley.edu/~matei/papers/…
Databricks as a pioneer for unifying the warehouse and data lake as a Lakehouse platform, eliminating the complexity with foundation as delta lake.
Image: Databricks
Delta Lake is a file-based opensource storage format (the empowering technology behind Databricks)
Image: Databricks
Another major technology behind Databricks is Photon - which is a query execution engine for both structured and unstructured data. It has much better performance in comparison.
Image: Databricks
Image: Databricks
Unity Catalog is the unified governance solution structure behind the Databricks platform. It is now opensource.
Delta Sharing - An open standard for secure data sharing - contributed to the Opensource (part of the Linux Foundation)/ It provides open cross-platform sharing, share live data without copy, centralize admin & governance, and data privacy.
Databricks Security Architecture is managed using the standard control plan and data plane architecture:
(Image: Databricks)
Databricks Lakehouse architecture is divided into the control plane and data plane, also offers serverless data plane.
Databricks Unity Catalog is the core behind the centralized governance
(Image: Databricks)
Catalog is the topmost element in the Unity catalog:
(Image: Databricks)
Unity catalog uses 3-level access mechanism:
Databricks Lakehouse Platform for Data Warehousing:
(Image: Databricks)
Databricks introduced Medallion Architecture (a data design pattern used to logically organize data in a lakehouse, with the goal of incrementally and progressively improving the structure and quality of data)
Databricks Workflows (an unified orchestration for data, analytics and AI on the Data Intelligence Platform) - fully managed orchestration service embedded in Lakehouse platform. It supports building orchestration flow in DLT, DBT, and other solutions.
The Databricks Data Intelligence Platform is a unified, open analytics platform designed for building, deploying, sharing, and maintaining enterprise-grade data, analytics, and AI solutions at scale.
Databricks has a well-architected framework for the lakehouse
Attending Snowflake Summit 2023 Keynote Session virtually - will be sharing notes as part of this thread as we progress 🧵
Feel free to share your learnings as a reply - will be glad to see snapshots from the people, who can attend in-person.
#snowflake #SnowflakeSummit2023
Frank Slootman (CEO) started with AI and Financial Services - with mention to DTCC, Fiserv, and Fidelity clearly indicating to bring more Financial Services clients addressing their Security, Governance, and other challenges - like the analogy with Data Universe👍
Native Applications Framework - one of the key announcements by Frank - to build apps similar to Mobile Appstore