When talking to people who haven’t deployed ML models, I keep hearing a lot of misperceptions about ML models in production. Here are a few of them.
(1/6)
1. Deploying ML models is hard
Deploying a model for friends to play with is easy. Export trained model, create an endpoint, build a simple app. 30 mins.
Deploying it reliably is hard. Serving 1000s of requests with ms latency is hard. Keeping it up all the time is hard.
(2/6)
2. You only have a few ML models in production
Booking, eBay have 100s models in prod. Google has 10000s. An app has multiple features, each might have one or multiple models for different data slices.
You can also serve combos of several models outputs like an ensemble.
(3/6)
3. If nothing happens, model performance remains the same
ML models perform best right after training. In prod, ML systems degrade quickly bc of concept drift.
Tip: train models on data generated 6 months ago & test on current data to see how much worse they get.
(4/6)
4. You won’t need to update your models as much
One mindboggling fact about DevOps: Etsy deploys 50 times/day. Netflix 1000s times/day. AWS every 11.7 seconds.
MLOps isn’t an exemption. For online ML systems, you want to update them as fast as humanly possible.
(5/6)
Deploying ML systems isn't just about getting ML systems to the end-users.
It's about building an infrastructure so the team can be quickly alerted when something goes wrong, figure out what went wrong, test in production, roll-out/rollback updates.
It's fun!
(6/6)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Really enjoyed LinkedIn's report on what worked and what didn't when deploying LLM applications. 4 takeaways.
1. Structured outputs
They chose YAML over JSON as the output format because YAML uses less tokens. Initially, only 90% of the outputs are correctly formatted YAML. They used re-prompting (asking the model to fix its YAML responses), which increased the number of API calls significantly.
They then analyzed the common formatting errors, added those hints to the original prompt, and wrote an error fixing script. This reduced their errors to 0.01%.
2. Sacrificing throughput for latency
Originally, they focused on TTFT (Time To First Token), but realized that TBT (Time Between Token) hurt them more, especially with Chain-of-Thought queries where users don’t see the intermediate outputs.
They found that TTFT and TBT inversely correlate with TPS (Tokens per Second). To achieve good TTFT and TBT, they had to sacrifice TPS.
3. Automatic evaluation is hard
One core challenge of evaluation is coming up with a guideline on what a good response is. For example, for skill fit assessment, the response: “You’re not a good fit” is correct, but not helpful.
Originally, evaluation was ad-hoc. Everyone could chime in. That didn’t work. They then have linguists build tooling and processes to standardize annotation, evaluating up to 500 daily conversations and these manual annotations guide their iteration.
Their next goal is to get automatic evaluation, but it’s not easy.
I've been talking to a lot of people looking to join/having joined startups and I'm flabbergasted by how often people think joining startups is a get rich quick scheme. Here's the math why it doesn't work and what to look for when joining startups. (1/n)
Equity: anywhere 0.001% - 10%. A friend recently joined a 15-pax seed startup that offered 4%/4 years + lot of $. He'd be the ML engineer. They need him to raise A. It looks good on paper but do you want a company where you're clearly the best at what you want to learn? (2/n)
For startups with product-market fit, star founders, top VCs (think Asana, Zoom), if you're the ~15th engineer, expect equity << 0.1%/4years. After subsequent rounds, it's diluted to < 0.05%. If startup is sold for $1B, which is rare, you'd make < 0.5M/4years. (3/n)
To learn how to design machine learning systems, I find it really helpful to read case studies to see how great teams deal with different deployment requirements and constraints. Here are some of my favorite case studies.
Topics covered: lifetime value, ML project workflow, feature engineering, model selection, prototyping, moving prototypes to production. It's completed with lessons learned and looking ahead!
Netflix streams to over 117M members worldwide, half of those living outside the US. The company uses machine learning to predict the network quality, detect device anomaly, handle predictive caching. medium.com/netflix-techbl…
To better understand the technical hiring pipelines, I analyzed 15,897 interview reviews for 27 major tech companies on Glassdoor. I focused on interviews for software engineering related roles, both junior and senior levels. These are some of the main findings. (1/n)
Each review consists of:
- result (no offer/accept offer/decline offer)
- difficulty (easy/medium/hard)
- experience (positive/neutral/negative)
- review (application/process/questions)
The largest SWE employers are Google, Amazon, Facebook, and Microsoft.
Strong correlation bw onsite-to-offer rate and offer yield rate (% of candidates who accept their offers). The more selective the company is, the less likely a candidate is to accept their offer. Candidates that pass interviews at FAANG are likely to have other attractive offers.
This thread is a combination of 10 free online courses on machine learning that I find the most helpful. They should be taken in order.
1. Probability and Statistics by Stanford Online
This self-paced course covers basic concepts in probability and statistics spanning over four fundamental aspects of machine learning: exploratory data analysis, producing data, probability, and inference. online.stanford.edu/courses/gse-yp…
2. Linear Algebra by MIT
Hands down the best linear algebra course I’ve seen, taught by the legendary professor Gilbert Strang. ocw.mit.edu/courses/mathem…