Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Shreya Shankar

@sh_reya

Sep 13, 2020 • 11 tweets • 4 min read • Read on X

@cHHillee

I have been thinking about @cHHillee's article about the state of ML frameworks in @gradientpub for almost a year now, as I've transitioned out of research to industry. It is a great read. Here's a thread of agreements & other perspectives:

thegradient.pub/state-of-ml-fr…

I do all my ML experimentation *on small datasets* in PyTorch. Totally agreed with these reasons to love PyTorch. I switched completely to PyTorch in May 2020 for my research. I disagree that TF needs to be more afraid of the future, though.

In industry, I don't work with toy datasets. I work with terabytes of data that come from Spark ETL processes. I dump my data to TFRecords and read it in TFData pipelines. If I'm already in TF, I don't care enough to write my neural nets in PyTorch.

I agree that researchers care about modeling iteration time and thus maybe prefer PyTorch. But engineers also care about fast iteration time. The difference: most of my iteration happens on the data side, not the modeling side.

The state of applied deep learning in industry is so bad that performance isn't the highest priority. First, we need models to work on non-academic datasets. Performance will be a priority later after VC funding runs out or large co's don't have excess $$ to blow on TPU pods.

I am glad people are thinking of "productionization" for these ML frameworks. I agree with these thoughts -- multi-platform support, fp16, cloud support are all stepping stones. But most people think of "productionization" in terms of training, not inference.

Think about the industry from a long-term perspective -- training DL models requires different hardware and infra than inference. Maybe you need TPUs or 24 V100 GPUs to train, but maybe you can do inference on a K80 GPU. It's such a hassle to do CI/CD for training new models.

Then the bottleneck to applied deep learning success becomes: how often do you retrain models? Can you train a GPT-3 once and hope it suffices for the year? Then most efforts will be on optimizing inference, in which case the cloud provider or framework doesn't matter as much.

In ML research, there is a huge training : inference ratio. In industry, we want there to be a small training : inference ratio. Unfortunately this isn't an issue frameworks can really address -- continual learning & robustness to dataset shift problems are unsolved in research.

@cHHillee

My conclusion is very similar to @cHHillee's below. This battle may be irrelevant. I find it crazy that people in industry outside big tech co's (finance, ad companies, etc) go to extreme lengths to write their own logistic regression algos optimized for their data & infra.

As I've begun to view this field from a lens of: what are the biggest blockers to having my ML models generate ROI for the company, I've realized a successful DL framework will operate smoothly with the ETL, EDA, and eng ecosystem. I don't see anyone doing that yet :)

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @sh_reya

Shreya Shankar

@sh_reya

Apr 24

⭐new MLOps preprint⭐

RAG is everywhere, but building RAG is still painful. When something breaks--the retriever? the LLM?--developers are left guessing, & iterating is often slow

we built a better way & used it as a design probe to study expert workflows 👇

Meet raggy: an interactive debugging interface for RAG pipelines. It pairs a Python library of RAG primitives with a UI that lets devs inspect, edit, & rerun steps in real time. raggy precomputes many indexes for retrieval upfront, so you can easily swap them out when debugging!

Then, to learn more about expert workflows, we ran a study with 12 engineers who’ve built production RAG pipelines. We simulated a question-answering application from a hospital & watched our participants use raggy to build and iterate on their pipelines. The paper reports a bunch of qualitative findings, including:

🔍 They always debug retrieval first
⚙️ Fixes to one step often break another
⚡ Fast iteration was key: raggy turned half-day experiments into seconds!!

Read 6 tweets

Shreya Shankar

@sh_reya

Jan 13

Introducing 📜DocWrangler: an open-source IDE for AI-powered data processing with built-in prompt engineering guidance and output inspection tools.

Code: github.com/ucbepic/docetl
Blog: data-people-group.github.io/blogs/2025/01/…
Free research preview: docetl.org/playground

Built @ Berkeley (1/7)

(2/7) Following the release of DocETL (our data processing framework), we observed users struggling to articulate what they want & changing their preferences based on what the LLM could or couldn't do well. The main challenge is that no one knows what outputs they want until they see it; that is, agentic workflows are inherently iterative.

(3/7) This release of DocWrangler has 3 main features. Key feature 1: spreadsheet interface with automatic summary overlays

Read 7 tweets

Shreya Shankar

@sh_reya

Dec 29, 2024

how come nobody is talking about how much shittier eng on-calls are thanks to blind integrations of AI-generated code? LLMs are great coders but horrible engineers. no, the solution is not “prompt the LLM to write more documentation and tests” (cont.)

i will take react development as an example. I use cursor but I think the problems are not specific to cursor. Every time I ask for a new feature to be added to my codebase, it almost always uses at least 1 too many state variables. When the code is not correct (determined by my interaction with the react app), and I prompt the LLM with the bug + to fix it, it will almost always add complexity rather than rewrite parts of what it already had

so the burden is on me to exhaustively test the generated app via my interactions, and then reverse engineer the mental model of what the code should be, and then eyeball the generated code to make sure this matches my model. This is so horrible to do for multi-file edits or > 800 lines of generated code (which is super common for web dev diffs)

Read 11 tweets

Shreya Shankar

@sh_reya

Nov 4, 2024

what makes LLM frameworks feel unusable is that there's still so much burden for the user to figure out the bespoke amalgamation of LLM calls to ensure end-to-end accuracy. in , we've found that relying on an agent to do this requires lots of scaffolding docetl.org

first there needs to be a way of getting theoretically valid task decompositions. simply asking an LLM to break down a complex task over lots of data may result in a logically incorrect plan. for example, the LLM might choose the wrong data operation (projection instead of aggregation), and this would be a different pipeline entirely.

to solve this problem, DocETL uses hand-defined rewrite directives that can enumerate theoretically-equivalent decompositions/pipeline rewrites. the agent is then limited to creating prompts/output schemas for newly synthesized operations, according to the rewrite rules, which bounds its errors.

Read 5 tweets

Shreya Shankar

@sh_reya

Oct 31, 2024

https://twitter.com/simonw/status/1851771710510633081

I have a lot of thoughts on this as someone who has manually combed through hundreds of humans' prompt deltas

https://twitter.com/simonw/status/1851771710510633081

first, humans tend to underspecify the first version of their prompt. if they're in the right environment where they can get a near-instantaneous LLM response in the same interface (e.g., chatgpt, Claude, openai playground), they just want to see what the llm can do

there's a lot of literature on LLM sensemaking from the HCI community here (our own "who validates the validators" paper is one of many), but I still think LLM sensemaking is woefully unexplored, especially with respect to the stage in the mlops lifecycle

Read 9 tweets

Shreya Shankar

@sh_reya

Oct 21, 2024

Our (first) DocETL preprint is now on Arxiv! "DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing" It has been almost 2 years in the making, so I am very happy we hit this milestone :-) arxiv.org/abs/2410.12189

DocETL is a framework for LLM-powered unstructured data processing and analysis. The big new idea in this paper is to automatically rewrite user-specified pipelines into a sequence of finer-grained and more accurate operators.

I'll mention two big contributions in this paper. First, we present a rich suite of operators, with three entirely new operators to deal with decomposing complex documents: the split, gather, and resolve operators.

Read 12 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Shreya Shankar

Try unrolling a thread yourself!

More from @sh_reya

Shreya Shankar

Shreya Shankar

Shreya Shankar

Shreya Shankar

Shreya Shankar

Shreya Shankar

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!