Building a scalable Data Analytics pipeline in the cloud 👇
☁️ BigQuery - Data warehouse
☁️ Cloud Storage - Data lake
☁️ Dataflow - Data processing
☁️ Dataproc - Managed Hadoop
Here are a few click-to-deploy architectures available in our Architecture Diagramming Tool!
✅ Simple VM app
✅ 3 tier app
✅ Batch ETL pipeline
✅ Cost Management
✅ Static web hosting with domain
✅ Storage event function app
Answer these questions
❓ What's your teams ML expertise?
❓ How much control/abstraction do you need?
❓ Would you like to handle the infrastructure components?
🧵 👇
@SRobTweets created this pyramid to explain the idea.
As you move up the pyramid, less ML expertise is required, and you also don’t need to worry as much about the infrastructure behind your model.
@SRobTweets If you’re using Open source ML frameworks (#TensorFlow) to build the models, you get the flexibility of moving your workloads across different development & deployment environments. But, you need to manage all the infrastructure yourself for training & serving
⚖️ How to deal with imbalanced datasets?⚖️
Most real-world datasets are not perfectly balanced. If 90% of your dataset belongs to one class, & only 10% to the other, how can you prevent your model from predicting the majority class 90% of the time?
🧵 👇
🐱🐱🐱🐱🐱🐱🐱🐱🐱🐶 (90:10)
💳 💳 💳 💳 💳 💳 💳 💳 💳 ⚠️ (90:10)
There can be many reasons for imbalanced data. First step is to see if it's possible to collect more data. If you're working with all the data that's available, these 👇 techniques can help
Here are 3 techniques for addressing data imbalance. You can use just one of these or all of them together:
⚖️ Downsampling
⚖️ Upsampling
⚖️ Weighted classes
Since it is Day 10 of #31DaysofML it's perfect to discuss 1️⃣0️⃣ things that can go wrong with #MachineLearning Projects and what you can do about it!
I watched this amazing presentation by @kweinmeister that sums it all up
A 🧵
@kweinmeister 1️⃣ You aren't solving the right problem
❓What's the goal of your ML model?
❓How do you assess if your model is "good" or "bad"?
❓What's your baseline?
👉 Focus on a long-term mission with maximum impact
👉 Ensure that your problem is a good fit for ML
@kweinmeister 2️⃣ Jumping into development without a prototype
👉 ML project is an iterative process
👉 Start with simple model & continue to refine it until you've reached your goal
👉 Quick prototype can tell a lot about hidden requirements, implementation challenges, scope, etc