P Profile picture
Mar 4 14 tweets 2 min read Read on X
So there are increasingly argument for "no human review code" provided that the agent(s) build code together with a suite of tests.

I see multiple problems with this approach
1/
- Assuming that the agent(s) can be relatively reliable (say 99.9%), there will be the case of serious failure where then people needs to jump in and good luck finding the problem if there is zero knowledge of the codebase.

2/
- With the given failure it is also hard to say to the agent "fix it", because the agent may need additional detail (otherwise it would have already done so) and those details can come only if one is involved in the codebase

3/
- Then there is the problem of maintainability. What if the agent is doing a good job but it is creating a spaghetti codebase that soon the agent itself could not develop anymore due to increased complexity?

4/
We have "taste" for complexity but we cannot reliably write test for it. So how do you catch that if you aren't involved in the codebase?

5/
- then there is the problem (already partially addressed) that tests catch only problem that fit the test. What about the rest? In an analogy: what if the agent builds a beautiful aircraft that works great but disassemble after 500 hours of flight?

6/
all the most normal and logical tests can pass, but there is unlikely a test for "500 hours of flight". Especially if the agent writes its tests (since it learned from similar human tests and combination of those)

7/
and then there is the problem of deskilling. When we do not use our brain because there is a tool for it (GPS, calculator, what not) we tend to lose that skill, if we do not use it in another ways.

Without the skill, how do we notice problems?

8/
How do we notice that we gave wrong inputs or requirements and garbage comes out? A calculator is almost always correct but if one mistype, the wrong answer can come out. How do you notice it if you do not have a sense for numbers?
9/
If you do not know orientation and stuff, you can type Denver instead of Detroit as destination in a navigator with GPS. How do you notice that that is wrong if you never review anything?

10/
And finally I think that accountability is also important. If the tool is "self built", do you want to take responsibility for its failures? Will be there an industry of "professional scapegoats"?

11/
This at least until the agents aren't good enough that their reliability (and skill) is incredible. Like Stockfish level (many other levels of superhuman in chess) but in any field.

In that case likely we will use agents like stockfish.

12/
We will use Stockfish as oracle for possible answers (or confirmations/refutations) while still wanting to understand why the answer is in that way. Hence we still want to involve ourselves in the solution, even if the solution is given to us.

13/13
@threadreaderapp unroll (test)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with P

P Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(