Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Shreya Shankar

@sh_reya

Oct 31 • 9 tweets • 2 min read • Read on X

https://twitter.com/simonw/status/1851771710510633081

I have a lot of thoughts on this as someone who has manually combed through hundreds of humans' prompt deltas

https://twitter.com/simonw/status/1851771710510633081

first, humans tend to underspecify the first version of their prompt. if they're in the right environment where they can get a near-instantaneous LLM response in the same interface (e.g., chatgpt, Claude, openai playground), they just want to see what the llm can do

there's a lot of literature on LLM sensemaking from the HCI community here (our own "who validates the validators" paper is one of many), but I still think LLM sensemaking is woefully unexplored, especially with respect to the stage in the mlops lifecycle

not only do people want to just see what the LLM can do, but they also don't fully know what they are supposed to say in the prompt, or what answer they want (they won't know it until they see it).

I think of a prompt as a form you need to fill out to submit your task to the LLM, and the fields of this form are unknown and dynamic (i.e., task-specific). a prompt writing tool can make these fields more known

second, humans derive a lot of the prompt content based on their sensemaking. if they observe a weird output, they edit their prompt (usually by adding another instruction). many of these edits are clarifications/definitions

for example (see our docetl paper), if you are an investigative journalist and want an LLM to find all instances of police misconduct in a report/document, you have to define misconduct. there are many types of misconduct, each of which also may require their own definitions

the LLM-generated prompt writer is GREAT here to relieve blank page syndrome for defining important terms in prompts. if I ask Claude to generate a prompt for "find all instances of police misconduct in this document", it makes an attempt to start definitions, which I can then refine

third, with docetl (our AI-powered data processing tool), some of our users don't have lots of programming experience, and as a result, I've sent them starter pipelines they can use to process their data.

surprisingly many of them can move forward without me, tweaking pipeline prompts, editing them, adding new operations, etc. I think an AI assistant can do my job of initially drafting the pipeline

overall I think prompting is going to be a collaborative effort between humans and LLMs in the future. humans alone are limited; LLM-written prompts alone are limited (you need to have _some_ human input and feedback to solve the human's fuzzy or underspecified task).

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @sh_reya

Shreya Shankar

@sh_reya

Oct 21

Our (first) DocETL preprint is now on Arxiv! "DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing" It has been almost 2 years in the making, so I am very happy we hit this milestone :-) arxiv.org/abs/2410.12189

DocETL is a framework for LLM-powered unstructured data processing and analysis. The big new idea in this paper is to automatically rewrite user-specified pipelines into a sequence of finer-grained and more accurate operators.

I'll mention two big contributions in this paper. First, we present a rich suite of operators, with three entirely new operators to deal with decomposing complex documents: the split, gather, and resolve operators.

Read 12 tweets

Shreya Shankar

@sh_reya

Oct 7

DocETL is our agentic system for LLM-powered data processing pipelines. Time for this week’s technical deep dive on _gleaning_, our automated technique to improve accuracy by iteratively refining outputs 🧠🔍 (using LLM-as-judge!)

2/ LLMs often don't return perfect results on the first try. Consider extracting insights from user logs with an LLM. An LLM might miss important behaviors or include extraneous information. These issues could lead to misguided product decisions or wasted engineering efforts.

3/ DocETL's gleaning feature uses the power of LLMs themselves to validate and refine their own outputs, creating a self-improving loop that significantly boosts output quality.

Read 11 tweets

Shreya Shankar

@sh_reya

Sep 24

LLMs have made exciting progress on hard tasks! But they still struggle to analyze complex, unstructured documents (including today's Gemini 1.5 Pro 002).

We (UC Berkeley) built 📜DocETL, an open-source, low-code system for LLM-powered data processing: data-people-group.github.io/blogs/2024/09/…

2/ Let's illustrate DocETL with an example task: analyzing presidential debates over the last 40 years to see what topics candidates discussed, & how the viewpoints of Democrats and Republicans evolved. The combined debate transcripts span ~740k words, exceeding context limits of most LLMs.

3/ But even for Gemini 1.5 Pro (2M token context limit), when given the entire dataset at once, it only reports on the evolution of 5 themes across all the debates! And, the reports get progressively worse as the output goes on. docetl.com/#demo-gemini-o…

Read 9 tweets

Shreya Shankar

@sh_reya

Oct 17, 2023

recently been studying prompt engineering through a human-centered (developer-centered) lens. here are some fun tips i’ve learned that don’t involve acronyms or complex words

if you don’t exactly specify the structure you want the response to take on, down to the headers or parentheses or valid attributes, the response structure may vary between LLM calls / it is not amenable to production

play around with the simplest prompt you can think of & run it a bunch of times on different inputs to build intuition for how LLMs “behave” for your task. then start adding instructions to your prompt in the form of rules, e.g., “do not do X”

Read 9 tweets

Shreya Shankar

@sh_reya

Sep 12, 2023

thinking about how, in the last year, > 5 ML engineers have told me, unprompted, that they want to do less ML & more software engineering. not because it’s more lucrative to build ML platforms & devtools, but because models can be too unpredictable & make for a stressful job

imo the biggest disconnect between ML-related research & production is that researchers aren’t aware of the human-centric efforts required to sustain ML performance. It feels great to prototype a good model, but on-calls battling unexpected failures chip away at this success

imagine that your career & promos are not about demonstrating good performance for a fixed dataset, but about how quickly on average you are able to respond to every issue some stakeholder has with some prediction. it is just not a sustainable career IMO

Read 8 tweets

Shreya Shankar

@sh_reya

Mar 29, 2023

Been working on LLMs in production lately. Here is an initial thoughtdump on LLMOps trends I’ve observed, compared/contrasted with their MLOps counterparts (no, this thread was not written by chat gpt)

1) Experimentation is tangibly more expensive (and slower) in LLMOps. These APIs are not cheap, nor is it really feasible to experiment w/ smaller/cheaper models and expect behaviors to stay consistent when calling bigger models

1.5) we know from MLOps research that high experimentation velocity is crucial for putting and keeping pipelines in prod. A fast way is to collect a few examples, load up a notebook, try out a heck of a lot of different prompts—calling for prompt versioning & management systems

Read 15 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Shreya Shankar

Try unrolling a thread yourself!

More from @sh_reya

Shreya Shankar

Shreya Shankar

Shreya Shankar

Shreya Shankar

Shreya Shankar

Shreya Shankar

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!