Okay, I did it. Threw Deep Research at the medical questions I tackled for ~months in 2020 when battling my wife's cancer
Based on my test case, this iteration of Deep Research can tell you what the current literature on a topic would advise, but not make novel deductions to improve upon where the human experts are at
I think it might have sped up my cancer research in 2020 but not replaced it. That guy saying it's better than his $150k/year team...maybe needs to get better at hiring, idk
🧵Thread with more details 0/n
Tbc, it's still a great tool even in the current state that I expect to use. Just hunting around for relevant topics of a paper and finding the relevant ones can take hours. Useful even if I have to read and critically judge the papers myself 1/n
Ok, so the test case: 1. we know if you have a malign tumor growing on your bone, you want to surgically cut it out 2. we know that if you cut very narrowly around the tumor, with little margin, you get worse outcomes than if you remove it with a wider margin (taking out more healthy tissue with it) – there's a straightforward monotonic curve here
The existing literature acknowledges this straightforward "more margin" -> "better outcomes" up until such a time as you consider amputation, i.e. the widest margin of at all. At this point, the literature is adamant that amputation offers no marginal benefit. Not, "no marginal benefit worth the marginal cost", just "no marginal benefit" 3/n
The literature cites observational studies showing that patients receiving amputations do no better than patients receiving "limb-sparing" surgery. Ofc, no one does RCTs for amputation, and amputations were reserved for patients with the most severe disease 4/n
In other words, the correct inference you should make is that amputation is so effective, that even when you select for patients with more severe disease, you get the same outcomes with patients with much milder disease 5/n
So both straightforward extrapolation and further empirical observation suggest that if you really want good survival outcomes, amputation is better, not "no additional benefit". I don't think it's a hard inference 6/n
I really feel like the "control for the confound of selection effects" should not be beyond current medical researchers. smh
What's the deal? My guess is patients are horrified at the thought of amputation more so than death, and oncologists want to cater to that 7/n
Once you're set on noamputation, you'd also like to believe this isn't costing you anything. Or that you're not hurting the patient's survival chances. Plus limb sparing is a very fancy surgery compared to butcherous amputation, much more fun 8/n
The human bias here makes sense. Sad, but it makes sense. The way people were talking about Deep Research, I thought perhaps if I told it "prioritize survival above all else", it would see through the human bias and make correct inferences from the more robust underlying data 9/n
Sadly, I think we'll get there before too long and when the model can start doing better than the inputs (garbage in, diamonds out), we will be in trouble. There's a lot of low hanging fruit for machines to optimize better and hard than we humans typically do 10/n
I agree with many that this could usher in utopia. But not by default, and not with the level of caution I think humanity is bringing to the challenge 11/11
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I was one of the developers in the @METR_Evals study. Thoughts:
1. This is much less true of my participation in the study where I was more conceintious, but I feel like historically a lot of my AI speed-up gains were eaten by the fact that while a prompt was running, I'd look at something else (FB, X, etc) and continue to do so for much longer than it took the prompt to run
I discovered two days ago that Cursor has (or now has) a feature you can enable to ring a bell when the prompt is done. I expect to reclaim a lot of the AI gains this way (1/N)
2. Historically I've lost some of my AI speed ups to cleaning up the same issues LLM code would introduce, often relatively simple violations of code conventions lik e using || instead of ??
A bunch of this is avoidable with stored system prompts which I was lazy about writing. Cursor has now made this easier and even attempts to learn repeatable rules "The user prefers X" that will get re-used, saving time here. (2/N)
3. Regarding me specifically, I work on the LessWrong codebase which is technically open-source. I feel like calling myself an "open-source developer" has the wrong connotations, and makes it more sound like I contribute to a highly-used Python library or something as an upper-tier developer which I'm not (3/N)
Lightcone/@lesswrong (where I work) is concluding the first month of our fundraiser. We’ve raised 1.3M out of 3M we need to make it through 2025. Habryka has a 12,000 word post making the case for us.
I’m here to tell you what Habryka cannot easily do so himself: why he as a specific human is worth funding for his projects. (Thread below.)
@lesswrong 0. I’ve known @ohabryka since 2013 when he was ~19. I’ve worked with him at LessWrong/Lightcone since 2019 (six years).
If you factor in foregone income and less portable career capital, I’m a major donor to Habryka’s projects myself. Here’s why I do it:
1. he is v smart (obvs) and one of the way that comes through is that he basically never offloads thinking about a domain to others. He believes hard in being a "generalist" and that means he can and does perform every task/role in the company, and for most tasks, does it better than any others. I'm talking coding, UI design, interior design, construction design, legal, fundraising, customer support, analytics, pest eradication, you name it.
He expects the same of the core team of "generalists" (it's a bad name, should be more like "specialists in everything"). The standard rule is we're only allowed to outsource stuff that we've done at least once ourselves.
For some domains, we'll consult or employ experts, e.g. lawyers or contractors, but Habryka will be building expertise in that topic too so he can scrutinize what the supposed experts say.
And you might think the CEO is too busy and too important for every day stuff, but he’s in the trenches as much as the rest of us. Over the vacation period (or any time), he’s the one picking up the slack in responding to support queries. During events at Lighthaven, he’s the one running around lighting the outdoor heaters.