The biggest change I observed between my last visit to India (Dec 2019) and now is how much an Aadhar card (national id card) is woven into everything.
🧵
The Aadhar card provides significant benefits in terms of fraud prevention esp.. with government services.
It also helps Indian businesses know their customers, while KYC is a work-in-progress for most American ones.
The Aadhar card is one reason micro payments have taken off.
But the Aadhar card success has been driven by the Indian bureaucrat's unique mixture of authoritarianism and laziness.
For example, in the course of one hour at a bank branch, I saw 3 customers unable to access their accounts.
Let's call this bank I was at as Bank B.
The 3 customers had opened their account 15+ years ago with Bank A.
Bank A got acquired by Bank B ten years ago and accounts got moved.
Bank B has mechanisms to get Aadhar card info and update their records for anyone who opened an account with them.
But they forgot about the customers who came in via their acquisition ten years ago.
These customers were now shit out of luck.
The bank personnel simply shrugged their shoulders, and asked the poor customers to write the head office with proof of some kind.
It's easy to implement a nationwide program if you don't care about the human effects of corner cases!
Another example: you need to get a Covid test to get on a plane.
To prevent fraud, the testing center has to record the Aadhar card & result of folks they test.
"No Aadhar card, no test," we were told.
So if you are a foreigner, you are SOL with no way to leave the country!
Fortunately, this is still India. A combination of begging + escalation + extra fees did the trick and we got our Covid test using our passports.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
A key decision that you have to make for each ML problem is to decide whether to: (1) buy a vendor's pre-built solution (2) build your own
Make this decision based on whether you have access to more data than the vendor.
This is also a handy rule to choose between vendors.
When building your own ML solution, avoid the temptation to build from scratch.
The best return on investment early in your projects is going to come from collecting more data (both more "rows" and more "columns", by breaking down data silos)
Many data engineers and CIOs tend to underestimate an ironic aspect of a dramatic increase in data volumes.
The larger the data volume gets, it makes more and more sense to process the data *more* frequently!
🧵
To see why, say that a business is creating a daily report based on its website traffic and this report took 2 hours to create.
If the website traffic grows by 4x, the report will take 8 hours to create. So, the tech people 4x the number of machines.
This is wrong-headed!
2/
Instead, consider an approach that makes the reports more timely:
* Compute statistics on 6 hours of data 4 times a day
* Aggregate these 6 hourly reports to create daily reports
* You can update your "daily" report four times a day.
* Data in report is only 6 hrs old!
3/
Five months later, our ML patterns book is #3 in AI, behind only the top ML intro book and the top research one. Very grateful for the validation ... W/ @SRobTweets amazon.com/Machine-Learni…
Like most authors, we keep hitting F5 to read the reviews 😁 My favorites 🧵👇
"When I was learning C++, I found the Gang of Four book "Design Patterns" accomplished a similar goal to help bridge the gap between academic knowledge and practical software engineering. Much like with the GoF book I suspect I may be re-reading parts of this book in the future"
"must-read for scientists and practitioners looking to apply machine learning theory to real life problems. I foresee this book becoming a classical of the discipline’s literature."