1/ Recently got access to Google's Illuminate, which AI-generates a ~6min "podcast"-format summary of a research paper. To test it, I gave it 5 of my own recent papers. This thread summarizes the evaluation: ↵
2/ I gave it my papers because I could most definitively evaluate how well it's doing. I gave it recent papers so it's likely there's very little data about them on the Web, simulating the "this paper just came out, help me grok it" mode. ↵
3/ I also chose papers across a variety of areas. The topics range across formal/theory, programming languages, systems, human-factors, software engineering, and education (and most a mix of these). This is to see how it fares across CS topics. ↵
4/ I picked 5 because you can do 5/day. After generating them, I sent them to my *co-authors* to see how *they* felt about the generated summary. Their scores are also below. Before giving scores, let me say two nice things about Illuminate: ↵
5/ a. The two voices sound very natural. The conversation actually flows (sort of, it's slightly artificial, but have you heard some of the *podcasts* out there?). That actually may make it sound more informed than it might be! ↵
6/ b. People have asked, "why the q&a format?" My answer: When you're reading, you want section headers or other markers of transition. That's hard to do in audio. The switch to another voice asking a question essentially serves as a human-friendly section header. ↵
7/ Now, one thing I didn't do well is pin down exactly what the grading criterion would be. My own view was: «Imagine that this is the summary of the paper presented by a student taking a papers-based grad seminar. What score would I give that student?» ↵
8/ But some people viewed it as, "Imagine an *undergrad* were asked to summarize the paper" (implicitly, an undergrad w/out much research experience, not the ones that function like grad students). That led to some grade "inflation" (due to different expectations). ↵
9/ Basta! Enough! This is like a recipe that first spends three paragraphs on the author's childhood and travels. I know why you're really here: you just want to see the grades! ↵ Conceptual Mutation Testing	CS Ed, SE	C+	B+			B	B		 Little Tricky Logics	Logic, human factors	D	C-				F	F	 Rust ownership models	PL, viz, human factors	C-		D	B+ / C				 R types	PL 	C-							B- Gradual Soundness	PL, SE	C				B	C
10/ Common issues: It makes some mistakes, but most of all it misses a lot of things, including some of the most important ideas. Often the specific parts were (effectively) about context and background, the parts that may be in LLMs, not so much about THIS paper. ↵
11/ It is just like a novice researcher: it gets a general sense of what's going on, doesn't always know what to focus on, sometimes does a fairly good idea of the gist (especially for "shallower" papers), but routinely misses some or ALL of what makes THIS paper valuable. ↵
12/ Overall, anything that can help more people learn more is good. But there are currently far better ways to achieve that. (That's why we blog!
) Illuminate will improve! But I regret to say that for now, you're going to have to actually read papers. (-:blog.brownplt.org

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Shriram Krishnamurthi 🟤 🏴‍☠️ 👨🏽‍🏫 🚴‍♂️ 🏏

Shriram Krishnamurthi 🟤 🏴‍☠️ 👨🏽‍🏫 🚴‍♂️ 🏏 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ShriramKMurthi

Jan 1
0/12 Here's a thread with links to reviews of the dozen most interesting books I read this past year: ↵
Read 14 tweets
Nov 21, 2022
1/ "What programming language should I teach?" is the least productive question to ask in computing. There's a good reason: it's the wrong question to ask. The reason language wars feel pointless is that they're a symptom of this problem. Here's why: ↵
2/ Curricula are never designed in isolation. All curricula, for anything, have to consider at least two things. First: goals. These include learning objectives, but often go farther (like "students must eventually get jobs"). ↵
3/ Also, constraints. The constraints dictate what the admissible set of solutions is. Constraints vary across both space (different places) and time (different years). Without stating constraints, finding solutions is meaningless. ↵
Read 10 tweets
Nov 20, 2022
1/ Many will tell you why Python is great for teaching coding, so I'll tell you ways it's not.

State is a bad default. It should be legal but safe & rare. The arc of programming is long and bends towards immutability. Its early use creates messes (eg, "a variable is a box".) ↵
2/ Rich and robust programming requires a strong understanding of data models and invariants. Python is weak at expressing either of those. You don't notice it until you miss it. ↵
3/ Mutation is not the same as binding, and variable is not the same as structure mutation. Python uses the same syntax for all three. This leads to confusion over concepts that are important, subtle, and should be clearly distinguished. ↵
Read 11 tweets
Jul 22, 2022
1/ CW: Holocaust. Since most of you will never visit Erfurt, a short thread on how engineering prowess devoid of humanity can lead to the most evil kinds of technological progress. A warning for all techies. I'll start in the next tweet for those for whom this is too much. ↵
2/ If you've visited concentration camps crematoria, you may have seen a brand name proudly advertised. It stands out: it's close to the ONLY brand you really see. The biggest crematorium supplier: Topf & Söhne, Erfurt. I first saw it in Buchenwald ~25y ago. Always wondered. ↵ "Topf & Söhne, Erfurt" logo from Buchenwald
3/ After the Allies liberated the camps, the Americans tried to pursue the Topf company. But Erfurt ended up in the former GDR, so the trail mostly went dry. (It's actually very complicated, not relevant here.) Erfurt was happy to bury the association. ↵
Read 33 tweets
Jun 15, 2022
1/ Since reproducibility in PL seems to be hot topic today, here's some (personal) historical perspective. I'm writing this mainly for the benefit of, say, junior grad students, who may not be aware of some of this and may benefit from the background. ↵
2/ Anecdotally (someone here will jump in and provide better evidence), there have long been issues with performance numbers in PL. I learned in grad schools compilers courses that: ↵
3/ There has even been rigorous research on these questions. Normally I turn to people like @ksmckinley @stevemblackburn @emeryberger @mathau @j_v_66 on this. Here's a great paper on this by Matthias and others: ↵
dl.acm.org/doi/10.1145/25…
Read 18 tweets
May 8, 2022
1/ This was a fun question with lots of interesting answers, though in the end Ian was pretty disappointed by the replies. But I thought it would be interesting to relate the history of how to language levels of @racketlang came to be. ↵
2/ It started with a pretty user-centric method, which was observing what students were doing. A LOT. We didn't set up cams, didn't log keys (this was 1995). We (Matthew, Robby, I) were all TAs, and even when not, spent a lot of time walking around in labs.
3/ We would then debrief with each other. We noticed that students were running into certain very particular kinds of frustrations, and that these had nothing to do with the problem, but rather the language. For instance:
Read 17 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(