Several people asked me about this: what's the rigorous research that code review helps?
Most of our data comes from case studies: where we follow a single group and see how code review affected their existing system. These studies are incredibly useful for real world data. t.co/bzubo2yGhb
One of our best sources here is the Smartbear analysis of their clients: They estimate that code review, done right, catches 70-90% of bugs. They also found in at least one case, code review cost half as much as letting bugs reach production.smartbear.com/SmartBear/medi…
The Smartbear stuff also found that many bugs would not have been caught by regular QA and testing, and also here that self-analysis also helps a ton:
While very positive, Smartbear shouldn't be our only source, as they're selling a product.ibm.com/developerworks…
This group did case studies on open source projects, and found a strong correlation between review discussion and code quality:
On the social side, about 97% of Google engineers were positive on code review. It also apparently pushed them to write smaller, more contained commits:
The majority of their code review comms were about code quality, communication, and understanding:
This suggests defect benefits are secondary to social benefits. We know how effective CR is for finding bugs, so the social benefit could be significant.microsoft.com/en-us/research…
Now this is all interesting evidence, but it isn't a slam-dunk. On the other hand... with pretty much every other technique we have conflicted or inconclusive evidence. CR is the only thing we have tons of significant favourable evidence showing strong benefits.
That's basically unheard of for a programming technique, and the reason I believe code review is the one technical practice we definitely, absolutely know improves our software.
Some updates:
The Smartbear paper defines defects as "anything that requires a change to be acceptable." This tricked me before: their 70-90% is for all defects, both bugs and code improvement suggestions, not just bugs. So they aren't likely to catch 90% of bugs :(
Most studies find that about a quarter of issues found in code reviews are about bugs, the rest are quality improvement issues () and (). These are also valuable! But what does it mean for _bug detection_?ieeexplore.ieee.org/document/46046… testroots.org/assets/papers/…
Most studies still show a strong effect: code review is very effective at finding bugs. For example, in this case study () they find one of the biggest indicators of post-release bugs is "poor code review". Most case studies and trials find similar.dl.acm.org/citation.cfm?i…
This is one of the reasons we've gotta be careful and read closely!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I was bitten by the knowledge management bug in 2020 but didn't like any of the apps I tried, including ones I made for myself. I recently tried a new approach: everything's on the filesystem, all relationships are represented with symlinks.
It's working really well!
Take tagging. All "tags" are subfolders of the Tags/ folder. If I want to tag `xyz.txt` as "TDD", I just add a symlink to "Tags/TDD". Now I can get everything tagged "TDD" with "ls Tags/TDD".
Getting all of xyz's tags? `gci -R Tags | ? -Prop Link -eq xyz`
(NB: I use powershell)
But wait, there's more! I can get everything that *shares* a tag with xyz by piping that to `ls`.
Now what if I want hierarchical tags, like "TDD is a subtag of testing"? Easy, just symlink Tags/TDD in Tags/Testing and use ls -R instead of ls for lookups.
Since Twitter had to go through with the sale out of fiscal duty to the shareholders, I tried to figure out what that meant for me. AFAICT based on this Vanguard Semiannual report, for every $1,000 in an S&P 500 index fund, I made approx 45 cents.
Is that worth it? Probably not for me, because I'm internet poisoned, but the average American is blissfully free of Twitter. Hard to figure out how much they made. Conditional median retirement account in 2019 was 65k, so… 'bout 30ish bucks per family?
I dunno, I guess if you went to 63 millionish families and said "a service you've never ever cared about is going to explode, here's 30 bucks", most would take the 30
Obv this is WILDLY Fermi estimate territory, just trying to get a sense for what "duty to the shareholds" meant
Someone brought up a potential issue with my theory: a legal source that used "boilerplate"… from 1865! That would throw my entire chain of events out the window.
I looked into it though and concluded it's not sufficient evidence. Here's my thinking: 🧵
First, that got me looking for the *earliest* use of boilerplate. Google Books helpfully gave me this source from 1540: google.com/books/edition/…
Wait, that's before *boilers*. Did Google just record the wrong date?
Seems so! "Acts of Malice" is actually from 1999.
So now we know that some texts are incorrectly dated. Maybe "Advisory Opinions" is also misdated? The typeface looks anachronistic, but I know nothing about typography, so I can't use that as a dating mechanism. Other historians could, though!
Why don't developers write more personal GUI tooling? I mean, besides the obvious reason that GUI libraries kinda suck and are much more oriented towards making consumer apps than personal tooling, and also because there are no good GUI tooling exemplars, and...
By "GUI tooling", I mean like `.\script` into the terminal and it pops open a lil window you can interact with.
The usual response is "CLI is better" but it's not better 100% of the time, and there's lots of cases where GUIs are real helpful!
The problem is easiness
If it's really easy to whip up a small GUI, then you'll use it for the 10% of cases where a GUI really helps. But it's really hard, so people never bother to learn. Then they don't use it even for the 2% of cases where it's the best possible tool for the job
While generally I think that software mocks are a Bad Idea, I also think that letting go of e2e testing is giving up a really powerful testing technique. e2e tests feature interaction in a way that unit tests don't. The trick is they're not at all "unit tests but bigger".
Unit tests can be written like scripts, e2e tests need to be "treated as an artifact": you write supporting infra, you create domain objects, you document, etc. You have to be intentional about it. It's more expensive but in return you get a lot more coverage of interacting parts
At a previous job we got a provider to give us a test account and wrote e2e tests that made changes to that account's data. Took time to set up and effort to maintain but it found a lot of really subtle issues that unit tests couldn't.
Ever since Strangeloop I've been thinking about end-user programming: people should write their own software, not just consume it from professionals. While I strongly believe this too, I never mesh with the advocates, and I wanted to figure out why. 🧵
I feel like I'm the perfect audience for this: I'm an expert AutoHotKey programmer and write tons of vim plugins and powershell scripts, and I just started making my own browser extensions. But at the same time, I don't care about the "model" end-user proglangs: smalltalk & lisp.
Listening to the end-user programming people, I always feel like I'm coming from a different world. I'm not convinced that repls and fully introspectable systems, a la Pharo, are necessary for end-user programming. The most successful examples, VB6 and Excel, have neither, right?