A key decision that you have to make for each ML problem is to decide whether to: (1) buy a vendor's pre-built solution (2) build your own
Make this decision based on whether you have access to more data than the vendor.
This is also a handy rule to choose between vendors.
When building your own ML solution, avoid the temptation to build from scratch.
The best return on investment early in your projects is going to come from collecting more data (both more "rows" and more "columns", by breaking down data silos)
Many data engineers and CIOs tend to underestimate an ironic aspect of a dramatic increase in data volumes.
The larger the data volume gets, it makes more and more sense to process the data *more* frequently!
🧵
To see why, say that a business is creating a daily report based on its website traffic and this report took 2 hours to create.
If the website traffic grows by 4x, the report will take 8 hours to create. So, the tech people 4x the number of machines.
This is wrong-headed!
2/
Instead, consider an approach that makes the reports more timely:
* Compute statistics on 6 hours of data 4 times a day
* Aggregate these 6 hourly reports to create daily reports
* You can update your "daily" report four times a day.
* Data in report is only 6 hrs old!
3/
Five months later, our ML patterns book is #3 in AI, behind only the top ML intro book and the top research one. Very grateful for the validation ... W/ @SRobTweets amazon.com/Machine-Learni…
Like most authors, we keep hitting F5 to read the reviews 😁 My favorites 🧵👇
"When I was learning C++, I found the Gang of Four book "Design Patterns" accomplished a similar goal to help bridge the gap between academic knowledge and practical software engineering. Much like with the GoF book I suspect I may be re-reading parts of this book in the future"
"must-read for scientists and practitioners looking to apply machine learning theory to real life problems. I foresee this book becoming a classical of the discipline’s literature."