today, i painfully resigned from a machine learning educational program for high schoolers that i cofounded with 3 friends (CS students).
i'm sharing my story b/c it's okay to quit things that are bad for mental health: (1/13)
the company started in a coffee shop in Palo Alto in summer 2018. "wouldn't it be cool if we could teach ML to high school kids?" one of us mentioned. "it's fairly easy to learn now, but online curricula can be too complicated."
we decided to make it happen. (2/13)
at the time, i was seriously dating someone. i didn't know it then, but the relationship was incredibly toxic. i had horrible nightmares and cried to sleep most nights. i started hallucinating events that involved my s/o, family members, and people close to me. (3/13)
of course, i love ML -- but also personally, i helped start the org because i believed if i spent more time with my closest friends and s/o (also studying ML), the mental problems would go away. i never told anyone anything about my mental problems. (4/13)
we were incredibly successful in our first year! we worked with over 100 students in the Bay Area and were cash flow positive. but the hallucinations and sleep problems got worse as the org grew. i began to resent everyone close to me for not being able to help. (5/13)
i wanted to leave in may 2019, but i never knew how. i was hallucinating events that caused me to resent my cofounders and closest friends. i saw psychiatrist after psychiatrist and was prescribed > 5 different medications in the past year. i felt so alone. (6/13)
whenever i did work, i couldn't sleep. i knew it wasn't my cofounders' fault, but i didn't know who to blame. i tried blaming my ex, who broke up w/ me when i started seeing a psychiatrist in Jan 2019, but it didn't help that he studied ML and reminded me of the org. (7/13)
to be honest, i couldn't blame anyone. the org reminded me of struggling through mental illness alone. but sometimes shit happens with no one to blame. once i scaled back my work for A4 in Sep 2019, things slowly got better. by Nov 2019, i could sleep for 8 hours at night. (8/13)
i thought i'd contribute to the org in greater capacity in 2020, but i felt too much stress and dread when thinking about the person i used to be when i started the org with my friends. i feel my brain wandering to places i don't want it to be in when i think about it. (9/13)
the reason i hadn't left before now was because the people / work weren't bad; it was just my fault i couldn't deal with it. i wanted to deal with it. but i realized i needed to leave. not because the work itself was bad, but because it reminded me of toxic things. (10/13)
so today i sent that resignation email, and i received nothing but love and support from my cofounders. it hurts to think i left them when they collectively did nothing wrong. sometimes (i mean a lot of times) i feel like it is my fault i can't get over my past. (11/13)
healing from my experiences around last year will take a long time. i've learned that no job is worth prolonging my symptoms of mental illness. it hurts to know i gave up, but i find comfort in the fact that today i chose to make room for new, positive experiences. (12/13)
if you or someone you know is leaving a job for mental health reasons, know that it is not always because of the environment or people. sometimes it is just hard to deal with mental health issues and a job. thank you for listening 💕 (13/13)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
RAG is everywhere, but building RAG is still painful. When something breaks--the retriever? the LLM?--developers are left guessing, & iterating is often slow
we built a better way & used it as a design probe to study expert workflows 👇
Meet raggy: an interactive debugging interface for RAG pipelines. It pairs a Python library of RAG primitives with a UI that lets devs inspect, edit, & rerun steps in real time. raggy precomputes many indexes for retrieval upfront, so you can easily swap them out when debugging!
Then, to learn more about expert workflows, we ran a study with 12 engineers who’ve built production RAG pipelines. We simulated a question-answering application from a hospital & watched our participants use raggy to build and iterate on their pipelines. The paper reports a bunch of qualitative findings, including:
🔍 They always debug retrieval first
⚙️ Fixes to one step often break another
⚡ Fast iteration was key: raggy turned half-day experiments into seconds!!
(2/7) Following the release of DocETL (our data processing framework), we observed users struggling to articulate what they want & changing their preferences based on what the LLM could or couldn't do well. The main challenge is that no one knows what outputs they want until they see it; that is, agentic workflows are inherently iterative.
(3/7) This release of DocWrangler has 3 main features. Key feature 1: spreadsheet interface with automatic summary overlays
how come nobody is talking about how much shittier eng on-calls are thanks to blind integrations of AI-generated code? LLMs are great coders but horrible engineers. no, the solution is not “prompt the LLM to write more documentation and tests” (cont.)
i will take react development as an example. I use cursor but I think the problems are not specific to cursor. Every time I ask for a new feature to be added to my codebase, it almost always uses at least 1 too many state variables. When the code is not correct (determined by my interaction with the react app), and I prompt the LLM with the bug + to fix it, it will almost always add complexity rather than rewrite parts of what it already had
so the burden is on me to exhaustively test the generated app via my interactions, and then reverse engineer the mental model of what the code should be, and then eyeball the generated code to make sure this matches my model. This is so horrible to do for multi-file edits or > 800 lines of generated code (which is super common for web dev diffs)
what makes LLM frameworks feel unusable is that there's still so much burden for the user to figure out the bespoke amalgamation of LLM calls to ensure end-to-end accuracy. in , we've found that relying on an agent to do this requires lots of scaffolding docetl.org
first there needs to be a way of getting theoretically valid task decompositions. simply asking an LLM to break down a complex task over lots of data may result in a logically incorrect plan. for example, the LLM might choose the wrong data operation (projection instead of aggregation), and this would be a different pipeline entirely.
to solve this problem, DocETL uses hand-defined rewrite directives that can enumerate theoretically-equivalent decompositions/pipeline rewrites. the agent is then limited to creating prompts/output schemas for newly synthesized operations, according to the rewrite rules, which bounds its errors.
first, humans tend to underspecify the first version of their prompt. if they're in the right environment where they can get a near-instantaneous LLM response in the same interface (e.g., chatgpt, Claude, openai playground), they just want to see what the llm can do
there's a lot of literature on LLM sensemaking from the HCI community here (our own "who validates the validators" paper is one of many), but I still think LLM sensemaking is woefully unexplored, especially with respect to the stage in the mlops lifecycle
Our (first) DocETL preprint is now on Arxiv! "DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing" It has been almost 2 years in the making, so I am very happy we hit this milestone :-) arxiv.org/abs/2410.12189
DocETL is a framework for LLM-powered unstructured data processing and analysis. The big new idea in this paper is to automatically rewrite user-specified pipelines into a sequence of finer-grained and more accurate operators.
I'll mention two big contributions in this paper. First, we present a rich suite of operators, with three entirely new operators to deal with decomposing complex documents: the split, gather, and resolve operators.