Shreya Shankar Profile picture
I study ML & AI engineers and try to make their lives a little better. PhD-ing in databases & HCI @Berkeley_EECS @UCBEPIC and MLOps-ing around town. She/they.
Eddie Dickey Profile picture Steve Simpson Profile picture 4 subscribed
Oct 17, 2023 9 tweets 2 min read
recently been studying prompt engineering through a human-centered (developer-centered) lens. here are some fun tips i’ve learned that don’t involve acronyms or complex words if you don’t exactly specify the structure you want the response to take on, down to the headers or parentheses or valid attributes, the response structure may vary between LLM calls / it is not amenable to production
Sep 12, 2023 8 tweets 2 min read
thinking about how, in the last year, > 5 ML engineers have told me, unprompted, that they want to do less ML & more software engineering. not because it’s more lucrative to build ML platforms & devtools, but because models can be too unpredictable & make for a stressful job imo the biggest disconnect between ML-related research & production is that researchers aren’t aware of the human-centric efforts required to sustain ML performance. It feels great to prototype a good model, but on-calls battling unexpected failures chip away at this success
Mar 29, 2023 15 tweets 3 min read
Been working on LLMs in production lately. Here is an initial thoughtdump on LLMOps trends I’ve observed, compared/contrasted with their MLOps counterparts (no, this thread was not written by chat gpt) 1) Experimentation is tangibly more expensive (and slower) in LLMOps. These APIs are not cheap, nor is it really feasible to experiment w/ smaller/cheaper models and expect behaviors to stay consistent when calling bigger models
Dec 23, 2022 5 tweets 1 min read
IMO the chatgpt discourse exposed just about how many people believe writing and communication is only about adhering to some sentence/paragraph structure I’ve been nervous for some time now, not because I think AI is going to automate away writing-heavy jobs, but because the act of writing has been increasingly commoditized to where I’m not sure whether people know how to tell good writing from bad writing. Useful from useless.
Dec 7, 2022 22 tweets 4 min read
I want to talk about my data validation for ML journey, and where I’m at now. I have been thinking about this for 6 ish years. It starts with me as an intern at FB. The task was to classify FB profiles with some type (e.g., politician, celebrity). I collected training data, Split it into train/val/test, iterated on the feature set a bit, and eventually got a good test accuracy. Then I “productionized” it, i.e., put it in a dataswarm pipeline (precursor to Airflow afaik). Then I went back to school before the pipeline ran more than once.
Sep 20, 2022 11 tweets 4 min read
Our understanding of MLOps is limited to a fragmented landscape of thought pieces, startup landing pages, & press releases. So we did interview study of ML engineers to understand common practices & challenges across organizations & applications: arxiv.org/abs/2209.09125 The paper is a must-read for anyone trying to do ML in production. Want us to give a talk to your group/org? Email shreyashankar@berkeley.edu. You can read the paper for the war stories & insights, so I’ll do a “behind the scenes” & “fave quotes” in this thread instead.
Aug 26, 2022 13 tweets 3 min read
Unit testing for ML is a big category of questions but here are my thoughts on the data validation piece (ensuring model inputs/outputs have good "quality" such that ML performance doesn't suffer). Old work in defining data constraints (e.g., Postgres style) fails us now bc (1) "quality" is not easily defined by a human---are you gonna comb through every feature column and create bounds?---and (2) the distribution matters; it's hard to look at one record alone and know whether it's "broken"
Jul 29, 2022 8 tweets 2 min read
I'm excited (and nervous) to post this thread: I've always known I wanted a partner but didn't know what a supportive one looked like (esp. as an ambitious woman who wants kids someday)! Now that I know, I'm so grateful for all the ways in which @PreetumNakkiran supports me: All tasks have conception/planning/execution phases (Rodsky et al.). Often people think they did a whole task but only did the execution part (e.g., cook meals). Someone else has to conceive & plan (e.g., regularly grocery shop & stock the fridge). This is still a lot of labor!
Jun 22, 2022 6 tweets 2 min read
Honestly: sometimes I feel defeated because ML observability is so hard. All facets are hard -- detecting, diagnosing, reacting to bugs. We don't have realtime ground truth labels (except recsys) so we don't know asap when performance goes down. Lots of $$ left on the table (1/6) By some miracle, maybe you know when pipelines are broken. Well, they have many models and thousands of features, so diagnosing is hard. I am sitting here, probing feature columns I didn't make, trying to figure out which features are important AND most broken. Nightmare (2/6)
May 4, 2022 10 tweets 2 min read
I probably should have written this years ago, but here are some MLOps principles I think every ML platform (codebase, data management platform) should have: 1/n Beginner: use pre-commit hooks. ML code is so, so ugly. Start with the basics — black, isort — then add pydocstyle, mypy, check-ast, eof-fixer, etc. Honestly I put these in my research codebases too, lol. 2/n
Feb 17, 2022 9 tweets 2 min read
I'm currently procrastinating on writing so I will write a thread on writing. This is mainly for STEM people who keep saying they want to write more. 1/9 Many people tell me they want to blog more, but they have several blockers (e.g., what to write about, how to find time, how to make a nice site to publish posts on). This mindset treats writing as a chore. First you need to figure out how to excite yourself to write 2/9
Dec 7, 2021 6 tweets 1 min read
Meta-thread: a 🧵 on writing technical 🧵s

[x] is popular / taking our world by storm. However, [y] is a blocker & hard problem. We came up with [z]. Thread:

[only include "pull figure" if it makes sense with no additional context] The current state of the world looks like [blank]. This is not very great for several reasons, [blank].

Ex: There are 100s of new Medium posts and Arxiv papers every day. This sucks -- we won't read them all, yet we still want people to read our work.
Nov 15, 2021 7 tweets 3 min read
IMO there's no substitute MLOps experience for building a pipeline that serves predictions at some endpoint (e.g., REST) and trying to sustain some performance over time. Some pointers & tutorials below: 1. Convince yourself that operationalizing ML, even as a 1-person team, is a hard problem. What are some differences between a kaggle project and a production ML service? Do some tutorials -- Here's a more-than-hello-world toy ML pipeline I've built: github.com/shreyashankar/…
Nov 1, 2021 10 tweets 3 min read
Continuous Integration (CI) & testing for ML pipelines is hard and generally unsolved. I’ve been thinking about this for a while now — why it’s important, what it means, why current solutions are suboptimal, and what we can do about it. (1/10) ML deployment in many organizations looks like this: someone builds a prediction pipeline, instruments it with a bit of monitoring, and off it goes into the world! But ML can be a nightmare when things go wrong in production: (2/10)
Jul 9, 2021 4 tweets 2 min read
Thread on relevant reflections from Richard Gabriel on AI companies < 1996, with my thoughts:

1/ Today's AI faculty at top schools have 1+ startup. They follow the same patterns: start a co on their open source project, close it, and sell something proprietary & different. 2/ Substitute Wall Street for Big Tech here. People evangelize AI use cases & problems at Google / FB / Microsoft. I can't count the number of times I have heard "the future is in training & deploying large models." Yeah well, that future is only for co's with lots of data.
Jul 5, 2021 5 tweets 1 min read
Recently I realized that the biggest benefit of going to Stanford is not the high quality of education or the network of successful people. It is the entitlement we develop, which the industry mistakes for confidence, that allows us to aim high and actually achieve our goals. Speaking to the undergrad CS experience: sure, the CS curriculum is top-notch, but Dijkstra's algo is the same everywhere. Most CS undergrads don't actually become close with their professors.
May 24, 2021 7 tweets 2 min read
This is quite interesting. I am a huge champion of "iterate on the data, not the model." However after further thought, I think there is actually greater opportunity to create academic benchmarks that actually hit the nail on the head for industry ML problems: (1/7) In the thread @AndrewYNg (accurately) argues that iterating on the data is more fruitful in developing ML applications. This begs the question, what really is the biggest difference between academic ML and industry ML tasks? (2/7)
May 4, 2021 5 tweets 1 min read
Computing hardware is getting really freaking powerful! I think edge inference and ML will become more popular very quickly.

It's pretty exciting to think about what this means for industry ML development and some new cool problems we can work on: (1/5) 1. Continuous Delivery. How do you ship new releases in a way that doesn't overwhelm the end user? (Good) ML models may need to be updated pretty frequently. I already get annoyed at the number of Docker Desktop updates; lol. (2/5)
Mar 11, 2021 7 tweets 2 min read
Update: I'm now working on ML tooling!

When I did applied ML, it seemed like many tools I initially found interesting were divorced from the reality of data, ML, and systems. I don't want to follow that pattern, so I built an open toy ML pipeline: github.com/shreyashankar/… (1/7) As a systems person, I've always been fascinated by the ways people debug ML workflows. It doesn't feel similar to debugging software, but the industry is trending towards treating ML as software from both development and deployment perspectives. So there's a gap here. (2/7)
Feb 14, 2021 9 tweets 2 min read
My thoughts on baselines, a concept that is *extremely* relevant in industry ML but does not exactly translate from academic ML: 1/9 In academic ML projects, my classmates and I would code up logistic regression or simple models as baselines before hacking our way to make whatever complicated neural network architecture work. 2/9
Dec 28, 2020 5 tweets 2 min read
The ML research ecosystem can be amazing. A few hours ago, I wondered: do pruned neural networks converge to high accuracies faster than the original networks? I'm sure I can find an answer in one of many lottery ticket hypothesis papers, but I wanted to explore myself. (1/5) First, I thought about how to formulate the question as an experiment. I would need to train a small FC network on MNIST, prune, and retrain. I forked the original LTH paper repo, ran the experiment, and plotted the test set accuracies for epochs 1 to 15. (2/5)