How is #DuckDb going to be used inside a company? My head can't think beyond it being used for local development. Can it really replace a DWH without being distributed?
If it does go beyond local, I do have a few things to ponder upon, 🧵
1. Search and Discovery: Where are users going to find artifacts inside the DuckDB cluster (eventually)? We'll need a catalog interface to surface this portion. The metadata of the artifacts in the data model (database, table) needs to be consistent across the cluster.
2. How is privacy handled? What if someone downloads sensitive data onto their laptop and it is stolen thus opening up a data breach? My paranoia comes from working on Data infra for GDPR, and CCPA. Access control should be implemented and enforced.
3. Chargeback: How are metrics tracked and thus the tracking of costs done in this environment? Assuming a cloud vendor for the provisioning, strong metadata is key here to know that costs are in check. What instance type works best for deployment is going to matter here.
4. Failover: How do we recover from instances falling over? Replication with a single leader assuming a single DC or multi-leader with multiple DCs. These need to be done carefully to avoid consistency issues. What do we sacrifice here in CAP?
5. Authz and Authn: How are these integrated into the cluster? Checking authorization and identification is important in a distributed environment.
I'm sure there are more things to think about than what I have here. Do add anything here in this thread. #datawarehouse
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.
