1. Strive to get from a push-based workflow (fill the lake) to a pull-based. I.e select use cases of business value and ingest the data they need into the lake.
3. Implement only what the use cases need, but first paint a clear long-term goal picture. Each step should take you towards this goal.
5. There is a tradeoff between data speed and innovation speed. Use the slowest form of integration your use cases can tolerate. Batch >> streaming >> microservices. Gravitate towards batch.
7. The only "new" technology that you need is a workflow orchestrator. They are simple, and glue your fragile components together to a robust system. Use #Luigi or @ApacheAirflow.
9. Make cross-functional teams, with sufficient combination of skills to be autonomous.
11. Constantly fight against entropy and limit heterogeneity and degrees of freedom.
12. Avoid components that cannot be managed through source code.
14. Collect and store raw data without processing first.
15. For collected personal data, split the PII out and make a link to the personal data to prepare for deletion.