Talia Ringer Profile picture
May 18 7 tweets 1 min read Twitter logo Read on Twitter
Currently watching the conference talk for CCTEST: Testing and Repairing Code Completion Systems.

Paper: arxiv.org/pdf/2208.08289…
Motivation of this is how we can actually find and fix LLM-generated buggy code completions
Important problem. I'm not yet following the approach. Seems to involve some kind of mutation testing. I will need to read this. They are talking about why unit tests aren't good oracles, which I buy. But what is their oracle? Still lost on that
It's cool that they're actually catching bugs in LLM-generated code, though. That's important. I think I am still unclear about the repair approach and whether I'd trust it though---especially since they seem to use metrics like BLEU I don't trust
They did some human evaluation of found bugs, which is definitely good
It'd be good to, at the very least, set a higher standard of testing for these systems, whether or not I buy the repair part. I think I am too scared that automatic repair could introduce bugs in the LLM-generated code that are not caught by the testing process
Anyways, it's good people are thinking about this problem seriously

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Talia Ringer

Talia Ringer Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TaliaRinger

May 19
I saw this on Facebook, and I'm confused why this is a hard task. Facebook post has ChatGPT failing at this too ImageImageImageImage
Ah, it's a tokenization thing. A good reminder of the limitations of this kind of tokenization I suppose

Lots of counting examples in the comments where tokenization isn't relevant but it still messes up. The thing is, counting is easy to teach transformer models, so I don't think the problem for those is counting in itself.
Read 31 tweets
May 18
One thing I find really fun about going to different dojos when I travel is that every place seems to have a slightly different emphasis and style. So everywhere you go you learn something new
Here it seems they spend more time on turnovers and breaking grip, so I learned a lot about that. But they spend less time on throw technique and combos, and the white belts don't do randori at all
So they seem confused when I do not know much about turnovers and breaking grip but feel very comfortable doing randori. It's nice though because I get some extra tricks to bring back home (if I can remember them)
Read 9 tweets
May 18
OK now we have @steven_xia_ on his work with @YuxiangWei9 and @LingmingZhang, all of @plfmse. Talking about program repair using large pre-trained language models Image
This is a large-scale evaluation of LLMs for program repair Image
Looks at three different repair scenarios Image
Read 7 tweets
May 18
Just thank god people actually care about this problem Image
Still scares me a bit to use program repair tools on LLM-generated buggy code, feels like too many layers of possible failure that could mislead the user into false confidence Image
This paper includes an interesting sample of LLM-generated bugs on which program repair tools currently fail, though Image
Read 6 tweets
May 17
It's weird meeting people I've only met on Twitter and realizing how much better everything is when you actually talk to each other
Like this medium just sucks for finding common ground. It's good for meeting people and feeling entertained and getting the word out and so on, but for finding common ground it really seems to do the opposite, just make everyone sound like a caricature
I guess it is just something I need to keep in mind while I'm on here. Not sure how practical it is but time and time again I'm reminded how much better direct conversation is for certain purposes
Read 7 tweets
May 17
Very proud of research programmer Arpan Agrawal on his first conference talk! An ICSE Demo talk about Proofster, a UIUC & UMass collaboration ImageImageImageImage
Find out more about Proofster here: alexsanchezstern.com/papers/proofst…
Arpan is looking to apply for PhD programs in the fall 🍂
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(