Several people asked me about this: what's the rigorous research that code review helps?

Most of our data comes from case studies: where we follow a single group and see how code review affected their existing system. These studies are incredibly useful for real world data. t.co/bzubo2yGhb
One of our best sources here is the Smartbear analysis of their clients: They estimate that code review, done right, catches 70-90% of bugs. They also found in at least one case, code review cost half as much as letting bugs reach production.smartbear.com/SmartBear/medi…
The Smartbear stuff also found that many bugs would not have been caught by regular QA and testing, and also here that self-analysis also helps a ton:

While very positive, Smartbear shouldn't be our only source, as they're selling a product.ibm.com/developerworks…
This group did case studies on open source projects, and found a strong correlation between review discussion and code quality:

And this study found students could catch almost 70% of defects in "design" reviews: smartbear.com/SmartBear/medi…
pitt.edu/~ckemerer/PSP_…
Here's a large-scale github analysis finding positive impacts for security:

This article about how code review compares to pair programming seems to suggest it's just as good, but I have significant doubts about their methods: www2.eecs.berkeley.edu/Pubs/TechRpts/…
link.springer.com/article/10.102…
On the social side, about 97% of Google engineers were positive on code review. It also apparently pushed them to write smaller, more contained commits:

One of the most interesting results, though, comes from Microsoft.sback.it/publications/i…
The majority of their code review comms were about code quality, communication, and understanding:

This suggests defect benefits are secondary to social benefits. We know how effective CR is for finding bugs, so the social benefit could be significant.microsoft.com/en-us/research…
Now this is all interesting evidence, but it isn't a slam-dunk. On the other hand... with pretty much every other technique we have conflicted or inconclusive evidence. CR is the only thing we have tons of significant favourable evidence showing strong benefits.
That's basically unheard of for a programming technique, and the reason I believe code review is the one technical practice we definitely, absolutely know improves our software.
Some updates:

The Smartbear paper defines defects as "anything that requires a change to be acceptable." This tricked me before: their 70-90% is for all defects, both bugs and code improvement suggestions, not just bugs. So they aren't likely to catch 90% of bugs :(
Most studies find that about a quarter of issues found in code reviews are about bugs, the rest are quality improvement issues () and (). These are also valuable! But what does it mean for _bug detection_?ieeexplore.ieee.org/document/46046…
testroots.org/assets/papers/…
Most studies still show a strong effect: code review is very effective at finding bugs. For example, in this case study () they find one of the biggest indicators of post-release bugs is "poor code review". Most case studies and trials find similar.dl.acm.org/citation.cfm?i…
This is one of the reasons we've gotta be careful and read closely!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Inactive; Bluesky is @hillelwayne(dot)com

Inactive; Bluesky is @hillelwayne(dot)com Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @hillelogram

Nov 16, 2022
I was bitten by the knowledge management bug in 2020 but didn't like any of the apps I tried, including ones I made for myself. I recently tried a new approach: everything's on the filesystem, all relationships are represented with symlinks.

It's working really well!
Take tagging. All "tags" are subfolders of the Tags/ folder. If I want to tag `xyz.txt` as "TDD", I just add a symlink to "Tags/TDD". Now I can get everything tagged "TDD" with "ls Tags/TDD".

Getting all of xyz's tags? `gci -R Tags | ? -Prop Link -eq xyz`

(NB: I use powershell)
But wait, there's more! I can get everything that *shares* a tag with xyz by piping that to `ls`.

Now what if I want hierarchical tags, like "TDD is a subtag of testing"? Easy, just symlink Tags/TDD in Tags/Testing and use ls -R instead of ls for lookups.
Read 6 tweets
Nov 16, 2022
Since Twitter had to go through with the sale out of fiscal duty to the shareholders, I tried to figure out what that meant for me. AFAICT based on this Vanguard Semiannual report, for every $1,000 in an S&P 500 index fund, I made approx 45 cents.

personal.vanguard.com/funds/reports/…
Is that worth it? Probably not for me, because I'm internet poisoned, but the average American is blissfully free of Twitter. Hard to figure out how much they made. Conditional median retirement account in 2019 was 65k, so… 'bout 30ish bucks per family?

federalreserve.gov/publications/f…
I dunno, I guess if you went to 63 millionish families and said "a service you've never ever cared about is going to explode, here's 30 bucks", most would take the 30

Obv this is WILDLY Fermi estimate territory, just trying to get a sense for what "duty to the shareholds" meant
Read 4 tweets
Nov 16, 2022
Someone brought up a potential issue with my theory: a legal source that used "boilerplate"… from 1865! That would throw my entire chain of events out the window.

I looked into it though and concluded it's not sufficient evidence. Here's my thinking: 🧵

google.com.au/books/edition/…
First, that got me looking for the *earliest* use of boilerplate. Google Books helpfully gave me this source from 1540: google.com/books/edition/…

Wait, that's before *boilers*. Did Google just record the wrong date?

Seems so! "Acts of Malice" is actually from 1999.
So now we know that some texts are incorrectly dated. Maybe "Advisory Opinions" is also misdated? The typeface looks anachronistic, but I know nothing about typography, so I can't use that as a dating mechanism. Other historians could, though! Text from the book in a typeface that I *think* is more mode
Read 7 tweets
Nov 1, 2022
Why don't developers write more personal GUI tooling? I mean, besides the obvious reason that GUI libraries kinda suck and are much more oriented towards making consumer apps than personal tooling, and also because there are no good GUI tooling exemplars, and...
By "GUI tooling", I mean like `.\script` into the terminal and it pops open a lil window you can interact with.

The usual response is "CLI is better" but it's not better 100% of the time, and there's lots of cases where GUIs are real helpful!

The problem is easiness
If it's really easy to whip up a small GUI, then you'll use it for the 10% of cases where a GUI really helps. But it's really hard, so people never bother to learn. Then they don't use it even for the 2% of cases where it's the best possible tool for the job
Read 7 tweets
Oct 26, 2022
While generally I think that software mocks are a Bad Idea, I also think that letting go of e2e testing is giving up a really powerful testing technique. e2e tests feature interaction in a way that unit tests don't. The trick is they're not at all "unit tests but bigger".
Unit tests can be written like scripts, e2e tests need to be "treated as an artifact": you write supporting infra, you create domain objects, you document, etc. You have to be intentional about it. It's more expensive but in return you get a lot more coverage of interacting parts
At a previous job we got a provider to give us a test account and wrote e2e tests that made changes to that account's data. Took time to set up and effort to maintain but it found a lot of really subtle issues that unit tests couldn't.
Read 8 tweets
Sep 28, 2022
Ever since Strangeloop I've been thinking about end-user programming: people should write their own software, not just consume it from professionals. While I strongly believe this too, I never mesh with the advocates, and I wanted to figure out why. 🧵

inkandswitch.com/end-user-progr…
I feel like I'm the perfect audience for this: I'm an expert AutoHotKey programmer and write tons of vim plugins and powershell scripts, and I just started making my own browser extensions. But at the same time, I don't care about the "model" end-user proglangs: smalltalk & lisp.
Listening to the end-user programming people, I always feel like I'm coming from a different world. I'm not convinced that repls and fully introspectable systems, a la Pharo, are necessary for end-user programming. The most successful examples, VB6 and Excel, have neither, right?
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(