Shreya Shankar Profile picture
Jul 24, 2020 5 tweets 1 min read Read on X
i’m willing to bet at least O(100) of you have experienced a silent failure because of numpy or broadcasting. even yesterday i found a pandas join bug because i didn’t reset the dataframe’s index. 😢

i‘m wondering how to build basic tools that “catch” such bugs before runtime.
the overkill solution may be to train a language model on “correct” numpy code to perform “autocorrect” and “autocomplete.” but the thought of collecting a dataset and paying $$ to train or fine-tune a model is meh.
you can probably approximate a good-enough solution without ML. what if you listen for every shape change in a variable, or enforce a typing scheme such that the default doesn’t allow you to change the shape of an array?
maybe output a warning every time you hit a shape change? or wrap lengthy numpy operations in a “numpy DAG” object in which the client could specify each input, and the DAG will compute the output shape wrt input shapes for the client to verify?
ML attracts so many silent failures and i wonder if people will either create better Python tools or just move to a different DSL entirely for productionized ML. would love to hear others’ perspectives on this.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Shreya Shankar

Shreya Shankar Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @sh_reya

Oct 17, 2023
recently been studying prompt engineering through a human-centered (developer-centered) lens. here are some fun tips i’ve learned that don’t involve acronyms or complex words
if you don’t exactly specify the structure you want the response to take on, down to the headers or parentheses or valid attributes, the response structure may vary between LLM calls / it is not amenable to production
play around with the simplest prompt you can think of & run it a bunch of times on different inputs to build intuition for how LLMs “behave” for your task. then start adding instructions to your prompt in the form of rules, e.g., “do not do X”
Read 9 tweets
Sep 12, 2023
thinking about how, in the last year, > 5 ML engineers have told me, unprompted, that they want to do less ML & more software engineering. not because it’s more lucrative to build ML platforms & devtools, but because models can be too unpredictable & make for a stressful job
imo the biggest disconnect between ML-related research & production is that researchers aren’t aware of the human-centric efforts required to sustain ML performance. It feels great to prototype a good model, but on-calls battling unexpected failures chip away at this success
imagine that your career & promos are not about demonstrating good performance for a fixed dataset, but about how quickly on average you are able to respond to every issue some stakeholder has with some prediction. it is just not a sustainable career IMO
Read 8 tweets
Mar 29, 2023
Been working on LLMs in production lately. Here is an initial thoughtdump on LLMOps trends I’ve observed, compared/contrasted with their MLOps counterparts (no, this thread was not written by chat gpt)
1) Experimentation is tangibly more expensive (and slower) in LLMOps. These APIs are not cheap, nor is it really feasible to experiment w/ smaller/cheaper models and expect behaviors to stay consistent when calling bigger models
1.5) we know from MLOps research that high experimentation velocity is crucial for putting and keeping pipelines in prod. A fast way is to collect a few examples, load up a notebook, try out a heck of a lot of different prompts—calling for prompt versioning & management systems
Read 15 tweets
Dec 23, 2022
IMO the chatgpt discourse exposed just about how many people believe writing and communication is only about adhering to some sentence/paragraph structure
I’ve been nervous for some time now, not because I think AI is going to automate away writing-heavy jobs, but because the act of writing has been increasingly commoditized to where I’m not sure whether people know how to tell good writing from bad writing. Useful from useless.
In my field, sometimes it feels like blog posts (that regurgitate useless commentary or make baseless forecasts about the future) are more celebrated/impactful than tooling and thought. Often such articles are written in the vein of PR or branding
Read 5 tweets
Dec 7, 2022
I want to talk about my data validation for ML journey, and where I’m at now. I have been thinking about this for 6 ish years. It starts with me as an intern at FB. The task was to classify FB profiles with some type (e.g., politician, celebrity). I collected training data,
Split it into train/val/test, iterated on the feature set a bit, and eventually got a good test accuracy. Then I “productionized” it, i.e., put it in a dataswarm pipeline (precursor to Airflow afaik). Then I went back to school before the pipeline ran more than once.
Midway through my intro DB course I realized that all the pipeline was doing was generating new training data and model versions every week. No new labels. So the pipeline made no sense. But whatever, I got into ML research and probably would never do ML in industry again.
Read 22 tweets
Sep 20, 2022
Our understanding of MLOps is limited to a fragmented landscape of thought pieces, startup landing pages, & press releases. So we did interview study of ML engineers to understand common practices & challenges across organizations & applications: arxiv.org/abs/2209.09125
The paper is a must-read for anyone trying to do ML in production. Want us to give a talk to your group/org? Email shreyashankar@berkeley.edu. You can read the paper for the war stories & insights, so I’ll do a “behind the scenes” & “fave quotes” in this thread instead.
Behind-the-scenes: another school invited my advisor to contribute to a repo of MLOps resources. We contributed what we could, but felt oddly disappointed by the little evidence we could point to for support.
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(