Shreya Shankar Profile picture
building 📜https://t.co/PmuOqAYt6q for interactive AI-powered data processing. DB & HCI & AI PhD student @Berkeley_EECS @UCBEPIC formerly ML eng & undergrad @Stanford
4 subscribers
Nov 4 5 tweets 2 min read
what makes LLM frameworks feel unusable is that there's still so much burden for the user to figure out the bespoke amalgamation of LLM calls to ensure end-to-end accuracy. in , we've found that relying on an agent to do this requires lots of scaffolding docetl.orgImage first there needs to be a way of getting theoretically valid task decompositions. simply asking an LLM to break down a complex task over lots of data may result in a logically incorrect plan. for example, the LLM might choose the wrong data operation (projection instead of aggregation), and this would be a different pipeline entirely.
Oct 31 9 tweets 2 min read
I have a lot of thoughts on this as someone who has manually combed through hundreds of humans' prompt deltas first, humans tend to underspecify the first version of their prompt. if they're in the right environment where they can get a near-instantaneous LLM response in the same interface (e.g., chatgpt, Claude, openai playground), they just want to see what the llm can do
Oct 21 12 tweets 3 min read
Our (first) DocETL preprint is now on Arxiv! "DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing" It has been almost 2 years in the making, so I am very happy we hit this milestone :-) arxiv.org/abs/2410.12189Image DocETL is a framework for LLM-powered unstructured data processing and analysis. The big new idea in this paper is to automatically rewrite user-specified pipelines into a sequence of finer-grained and more accurate operators.
Oct 7 11 tweets 3 min read
DocETL is our agentic system for LLM-powered data processing pipelines. Time for this week’s technical deep dive on _gleaning_, our automated technique to improve accuracy by iteratively refining outputs 🧠🔍 (using LLM-as-judge!) Image 2/ LLMs often don't return perfect results on the first try. Consider extracting insights from user logs with an LLM. An LLM might miss important behaviors or include extraneous information. These issues could lead to misguided product decisions or wasted engineering efforts.
Sep 24 9 tweets 4 min read
LLMs have made exciting progress on hard tasks! But they still struggle to analyze complex, unstructured documents (including today's Gemini 1.5 Pro 002).

We (UC Berkeley) built 📜DocETL, an open-source, low-code system for LLM-powered data processing: data-people-group.github.io/blogs/2024/09/…Image 2/ Let's illustrate DocETL with an example task: analyzing presidential debates over the last 40 years to see what topics candidates discussed, & how the viewpoints of Democrats and Republicans evolved. The combined debate transcripts span ~740k words, exceeding context limits of most LLMs.
Oct 17, 2023 9 tweets 2 min read
recently been studying prompt engineering through a human-centered (developer-centered) lens. here are some fun tips i’ve learned that don’t involve acronyms or complex words if you don’t exactly specify the structure you want the response to take on, down to the headers or parentheses or valid attributes, the response structure may vary between LLM calls / it is not amenable to production
Sep 12, 2023 8 tweets 2 min read
thinking about how, in the last year, > 5 ML engineers have told me, unprompted, that they want to do less ML & more software engineering. not because it’s more lucrative to build ML platforms & devtools, but because models can be too unpredictable & make for a stressful job imo the biggest disconnect between ML-related research & production is that researchers aren’t aware of the human-centric efforts required to sustain ML performance. It feels great to prototype a good model, but on-calls battling unexpected failures chip away at this success
Mar 29, 2023 15 tweets 3 min read
Been working on LLMs in production lately. Here is an initial thoughtdump on LLMOps trends I’ve observed, compared/contrasted with their MLOps counterparts (no, this thread was not written by chat gpt) 1) Experimentation is tangibly more expensive (and slower) in LLMOps. These APIs are not cheap, nor is it really feasible to experiment w/ smaller/cheaper models and expect behaviors to stay consistent when calling bigger models
Dec 23, 2022 5 tweets 1 min read
IMO the chatgpt discourse exposed just about how many people believe writing and communication is only about adhering to some sentence/paragraph structure I’ve been nervous for some time now, not because I think AI is going to automate away writing-heavy jobs, but because the act of writing has been increasingly commoditized to where I’m not sure whether people know how to tell good writing from bad writing. Useful from useless.
Dec 7, 2022 22 tweets 4 min read
I want to talk about my data validation for ML journey, and where I’m at now. I have been thinking about this for 6 ish years. It starts with me as an intern at FB. The task was to classify FB profiles with some type (e.g., politician, celebrity). I collected training data, Split it into train/val/test, iterated on the feature set a bit, and eventually got a good test accuracy. Then I “productionized” it, i.e., put it in a dataswarm pipeline (precursor to Airflow afaik). Then I went back to school before the pipeline ran more than once.
Sep 20, 2022 11 tweets 4 min read
Our understanding of MLOps is limited to a fragmented landscape of thought pieces, startup landing pages, & press releases. So we did interview study of ML engineers to understand common practices & challenges across organizations & applications: arxiv.org/abs/2209.09125 The paper is a must-read for anyone trying to do ML in production. Want us to give a talk to your group/org? Email shreyashankar@berkeley.edu. You can read the paper for the war stories & insights, so I’ll do a “behind the scenes” & “fave quotes” in this thread instead.
Aug 26, 2022 13 tweets 3 min read
Unit testing for ML is a big category of questions but here are my thoughts on the data validation piece (ensuring model inputs/outputs have good "quality" such that ML performance doesn't suffer). Old work in defining data constraints (e.g., Postgres style) fails us now bc (1) "quality" is not easily defined by a human---are you gonna comb through every feature column and create bounds?---and (2) the distribution matters; it's hard to look at one record alone and know whether it's "broken"
Jul 29, 2022 8 tweets 2 min read
I'm excited (and nervous) to post this thread: I've always known I wanted a partner but didn't know what a supportive one looked like (esp. as an ambitious woman who wants kids someday)! Now that I know, I'm so grateful for all the ways in which @PreetumNakkiran supports me: All tasks have conception/planning/execution phases (Rodsky et al.). Often people think they did a whole task but only did the execution part (e.g., cook meals). Someone else has to conceive & plan (e.g., regularly grocery shop & stock the fridge). This is still a lot of labor!
Jun 22, 2022 6 tweets 2 min read
Honestly: sometimes I feel defeated because ML observability is so hard. All facets are hard -- detecting, diagnosing, reacting to bugs. We don't have realtime ground truth labels (except recsys) so we don't know asap when performance goes down. Lots of $$ left on the table (1/6) By some miracle, maybe you know when pipelines are broken. Well, they have many models and thousands of features, so diagnosing is hard. I am sitting here, probing feature columns I didn't make, trying to figure out which features are important AND most broken. Nightmare (2/6)
May 4, 2022 10 tweets 2 min read
I probably should have written this years ago, but here are some MLOps principles I think every ML platform (codebase, data management platform) should have: 1/n Beginner: use pre-commit hooks. ML code is so, so ugly. Start with the basics — black, isort — then add pydocstyle, mypy, check-ast, eof-fixer, etc. Honestly I put these in my research codebases too, lol. 2/n
Feb 17, 2022 9 tweets 2 min read
I'm currently procrastinating on writing so I will write a thread on writing. This is mainly for STEM people who keep saying they want to write more. 1/9 Many people tell me they want to blog more, but they have several blockers (e.g., what to write about, how to find time, how to make a nice site to publish posts on). This mindset treats writing as a chore. First you need to figure out how to excite yourself to write 2/9
Dec 7, 2021 6 tweets 1 min read
Meta-thread: a 🧵 on writing technical 🧵s

[x] is popular / taking our world by storm. However, [y] is a blocker & hard problem. We came up with [z]. Thread:

[only include "pull figure" if it makes sense with no additional context] The current state of the world looks like [blank]. This is not very great for several reasons, [blank].

Ex: There are 100s of new Medium posts and Arxiv papers every day. This sucks -- we won't read them all, yet we still want people to read our work.
Nov 15, 2021 7 tweets 3 min read
IMO there's no substitute MLOps experience for building a pipeline that serves predictions at some endpoint (e.g., REST) and trying to sustain some performance over time. Some pointers & tutorials below: 1. Convince yourself that operationalizing ML, even as a 1-person team, is a hard problem. What are some differences between a kaggle project and a production ML service? Do some tutorials -- Here's a more-than-hello-world toy ML pipeline I've built: github.com/shreyashankar/…
Nov 1, 2021 10 tweets 3 min read
Continuous Integration (CI) & testing for ML pipelines is hard and generally unsolved. I’ve been thinking about this for a while now — why it’s important, what it means, why current solutions are suboptimal, and what we can do about it. (1/10) ML deployment in many organizations looks like this: someone builds a prediction pipeline, instruments it with a bit of monitoring, and off it goes into the world! But ML can be a nightmare when things go wrong in production: (2/10)
Jul 9, 2021 4 tweets 2 min read
Thread on relevant reflections from Richard Gabriel on AI companies < 1996, with my thoughts:

1/ Today's AI faculty at top schools have 1+ startup. They follow the same patterns: start a co on their open source project, close it, and sell something proprietary & different. 2/ Substitute Wall Street for Big Tech here. People evangelize AI use cases & problems at Google / FB / Microsoft. I can't count the number of times I have heard "the future is in training & deploying large models." Yeah well, that future is only for co's with lots of data.
Jul 5, 2021 5 tweets 1 min read
Recently I realized that the biggest benefit of going to Stanford is not the high quality of education or the network of successful people. It is the entitlement we develop, which the industry mistakes for confidence, that allows us to aim high and actually achieve our goals. Speaking to the undergrad CS experience: sure, the CS curriculum is top-notch, but Dijkstra's algo is the same everywhere. Most CS undergrads don't actually become close with their professors.