Ankur Kumar 💫 Profile picture
Sep 8, 2024 22 tweets 8 min read Read on X
Read this thread to know about Lakehouse architecture pioneered by Databricks (keeping it an active thread to keep all the key architecture knowledge as simple read) 🧵
(image source: Databricks)

#Databricks #Lakehouse #DeltaLake Image
Databricks along with UC Berkley & Stanford published this whitepaper in 2021 to revolutionize the DWH with Lakehouse architecture -> "Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics"
people.eecs.berkeley.edu/~matei/papers/…
Image
Databricks as a pioneer for unifying the warehouse and data lake as a Lakehouse platform, eliminating the complexity with foundation as delta lake.

Image: Databricks Image
Delta Lake is a file-based opensource storage format (the empowering technology behind Databricks)
Image: Databricks Image
Another major technology behind Databricks is Photon - which is a query execution engine for both structured and unstructured data. It has much better performance in comparison.

Image: Databricks Image
Image: Databricks Image
Unity Catalog is the unified governance solution structure behind the Databricks platform. It is now opensource.

github.com/unitycatalog/u…
Image
Delta Sharing - An open standard for secure data sharing - contributed to the Opensource (part of the Linux Foundation)/ It provides open cross-platform sharing, share live data without copy, centralize admin & governance, and data privacy.

delta.io
Image
Databricks Security Architecture is managed using the standard control plan and data plane architecture:
(Image: Databricks) Image
Databricks Lakehouse architecture is divided into the control plane and data plane, also offers serverless data plane.
Image
Image
Databricks Unity Catalog is the core behind the centralized governance

(Image: Databricks) Image
Catalog is the topmost element in the Unity catalog:
(Image: Databricks) Image
Unity catalog uses 3-level access mechanism: Image
Databricks Lakehouse Platform for Data Warehousing:
(Image: Databricks) Image
Databricks introduced Medallion Architecture (a data design pattern used to logically organize data in a lakehouse, with the goal of incrementally and progressively improving the structure and quality of data)

databricks.com/glossary/medal…
Image
Delta Live Tables (DLT) - an ETL framework to build data pipelines in a declarative way (supports entire pipeline in Python & SQL).

databricks.com/product/delta-…
Image
Databricks Workflows (an unified orchestration for data, analytics and AI on the Data Intelligence Platform) - fully managed orchestration service embedded in Lakehouse platform. It supports building orchestration flow in DLT, DBT, and other solutions.

databricks.com/product/workfl…
Image
Reference Architecture for Streaming Use cases on Databricks:
docs.databricks.com/en/lakehouse-a…
Image
Building Gen AI Applications on Databricks using Databricks Lakehouse AI:


(Image Source: Databricks) databricks.com/blog/lakehouse…
Image
The Databricks Data Intelligence Platform is a unified, open analytics platform designed for building, deploying, sharing, and maintaining enterprise-grade data, analytics, and AI solutions at scale. Image
Databricks has a well-architected framework for the lakehouse

Source: docs.databricks.com/en/lakehouse-a…
Image
@threadreaderapp unroll

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ankur Kumar 💫

Ankur Kumar 💫 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ankurkumarz

Jun 27, 2023
Attending Snowflake Summit 2023 Keynote Session virtually - will be sharing notes as part of this thread as we progress 🧵
Feel free to share your learnings as a reply - will be glad to see snapshots from the people, who can attend in-person.

#snowflake #SnowflakeSummit2023
Frank Slootman (CEO) started with AI and Financial Services - with mention to DTCC, Fiserv, and Fidelity clearly indicating to bring more Financial Services clients addressing their Security, Governance, and other challenges - like the analogy with Data Universe👍
Native Applications Framework - one of the key announcements by Frank - to build apps similar to Mobile Appstore

#SnowflakeSummit
Read 30 tweets
Nov 28, 2022
AWS re:Invent is happening this week and sharing my observations/notes in this thread 🧵
#awsreinvent #awsreinvent2022
Key sessions related to Microservices in reinvent (you can also watch online later)👇 Image
Shared as part of Application Integration Patterns for #Microservices - using Message Exchange pattern for communication 👇 Image
Read 33 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(