Many of our customers want to know how to choose a technology stack for solving problems with machine learning.

In this article, I summarize my thought process when suggesting a tech stack for ML. 🧵
A key decision that you have to make for each ML problem is to decide whether to:
(1) buy a vendor's pre-built solution
(2) build your own

Make this decision based on whether you have access to more data than the vendor.

This is also a handy rule to choose between vendors.
When building your own ML solution, avoid the temptation to build from scratch.

The best return on investment early in your projects is going to come from collecting more data (both more "rows" and more "columns", by breaking down data silos)

Use standard and/or low-code models
The tech stack depends on the type of ML application:
1. Predictive analytics
2. Unstructured data
3. Automation
4. Recommendations

I'll quickly summarize my recommendations for each, but read article linked from headline tweet for more details.
For predictive analytics, the key thing is to use a tech stack where you can keep growing your data and train ML models without data movement

Build an EDW.

Train BigQuery ML models

When improvements due to data size plateau, build Tensorflow/Keras models that read off BigQuery
For unstructured data, the ROI of AutoML is hard to beat for small to medium data sizes.

For large data sizes, use pre-built models that have already been written to use TPUs efficiently. Start with transfer learning, then do fine tuning, and train from scratch.
For automation, you will be training several models and orchestrating them.

Some of them will be pre-built (eg Document AI). Others will be low-code (eg BigQuery ML). Others will be no-code (eg Auto ML Video Intelligence).

You need a unified AI platform. Use Vertex AI pipelines
For recommendations, you again need an enterprise data warehouse (EDW).

You need one well integrated with your transactional databases.

Use Datastream to do CDC into BigQuery.

Start with BigQuery ML. Move to Recommendations AI. Once improvements plateau, train from scratch.
And as always, reach out to your Google Cloud account team if you want to talk through your options and brainstorm of what approach to start with.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Lαк Lαкѕнмαηαη

Lαк Lαкѕнмαηαη Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @lak_gcp

21 Jun
Many data engineers and CIOs tend to underestimate an ironic aspect of a dramatic increase in data volumes.

The larger the data volume gets, it makes more and more sense to process the data *more* frequently!
🧵
To see why, say that a business is creating a daily report based on its website traffic and this report took 2 hours to create.

If the website traffic grows by 4x, the report will take 8 hours to create. So, the tech people 4x the number of machines.

This is wrong-headed!

2/
Instead, consider an approach that makes the reports more timely:

* Compute statistics on 6 hours of data 4 times a day
* Aggregate these 6 hourly reports to create daily reports
* You can update your "daily" report four times a day.
* Data in report is only 6 hrs old!

3/
Read 5 tweets
28 Mar
Five months later, our ML patterns book is #3 in AI, behind only the top ML intro book and the top research one. Very grateful for the validation ... W/ @SRobTweets
amazon.com/Machine-Learni…
Like most authors, we keep hitting F5 to read the reviews 😁 My favorites 🧵👇
"When I was learning C++, I found the Gang of Four book "Design Patterns" accomplished a similar goal to help bridge the gap between academic knowledge and practical software engineering. Much like with the GoF book I suspect I may be re-reading parts of this book in the future"
"must-read for scientists and practitioners looking to apply machine learning theory to real life problems. I foresee this book becoming a classical of the discipline’s literature."
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(