How is #DuckDb going to be used inside a company? My head can't think beyond it being used for local development. Can it really replace a DWH without being distributed?
If it does go beyond local, I do have a few things to ponder upon, 🧵
1. Search and Discovery: Where are users going to find artifacts inside the DuckDB cluster (eventually)? We'll need a catalog interface to surface this portion. The metadata of the artifacts in the data model (database, table) needs to be consistent across the cluster.
2. How is privacy handled? What if someone downloads sensitive data onto their laptop and it is stolen thus opening up a data breach? My paranoia comes from working on Data infra for GDPR, and CCPA. Access control should be implemented and enforced.
3. Chargeback: How are metrics tracked and thus the tracking of costs done in this environment? Assuming a cloud vendor for the provisioning, strong metadata is key here to know that costs are in check. What instance type works best for deployment is going to matter here.
4. Failover: How do we recover from instances falling over? Replication with a single leader assuming a single DC or multi-leader with multiple DCs. These need to be done carefully to avoid consistency issues. What do we sacrifice here in CAP?
5. Authz and Authn: How are these integrated into the cluster? Checking authorization and identification is important in a distributed environment.
I'm sure there are more things to think about than what I have here. Do add anything here in this thread. #datawarehouse
• • •
Missing some Tweet in this thread? You can try to
force a refresh
The last and often neglected piece of the Data Platform is the Business Intelligence layer. It's what makes the business make sense.
Continuing in the Data Platform building series, let's talk about Business intelligence.
💫 Part 10: Business Intelligence:
To be transparent, my experience is very minimal in this area so I'm going to mostly refer to what I've seen and inferred from others. If you have insights and things to add, please do comment below.
✴ The purpose of this:
Why have a BI layer in the first place?
- Give the end user, an analyst, an executive, or a non-team stakeholder, a view into data.
- Visualization helps to see the data in front of you and powers business decisions.
) earlier this week. Let's do @Twitter's tech and Data
This company might be popular for its platform at large but there were a lot of data industry pieces that are worth calling out. Here is a 🧵
Streaming Processing: Storm! This was streaming before streaming. It paved the way for a lot of streaming processing systems. Event processing at the earliest. blog.twitter.com/engineering/en…