My Authors
Read all threads
The most common feature-flagging pain I hear of is "feature flag debt" - stale flags clogging up your codebase with conditionals and dead code.

Uber just open-sourced Piranha, an internal tool for detecting and cleaning up stale feature flags.

Let's talk about it a bit...
Piranha is a tool specifically for managing feature flag debt in Uber's mobile apps: eng.uber.com/piranha/

They also have a really interesting academic paper describing it, along with lots of interesting details on feature flagging @ Uber in general: manu.sridharan.net/files/ICSE20-S…
Besides being an interesting approach to a very common problem, their discussion of Piranha also provides some very interesting insights into an organization that's *heavily* invested in feature flagging...
And when I say heavily invested, I mean it. The paper describes 6601 feature flags, just in their mobile codebases. Now apparently that's 7.7 *million* lines of code, but regardless it's still a lot of feature flags.
As one would expect given that count, Uber use flags for a lot of purposes. Experimentation, controlled rollout, safety, and sometimes for long-term operational things like kill switches and testing in prod (more on that in a moment).
They seem to be big fans of managing features by geo/market - turning on a feature for a specific market out before rolling it out more broadly.

I've seen this approach at other orgs that are naturally segmented by market. Makes it easier to understand the scope of a rollout.
Uber's flagging system has the ability for flags to be more than just on or off - a flag state can be an integer or a double, for example. However, these "parameter flags" comprise only a "very small set of flags" - the rest are simple on/off booleans.
Uber's flagging system has the ability for flags to be more than just on or off - a flag state can be an integer or a double, for example. However, these "parameter flags" comprise only a "very small set of flags" - the rest are simple on/off booleans.
All these flags come with a cost. They make Uber's code harder to understand, and slow down build+test.

They also introduce the risk of accidentally flipping a stale flag in prod, with untested and scary consequences (c.f. the Knight Capital Knightmare) bugsnag.com/blog/bug-day-4…
Interestingly, the paper *doesn't* mention a cost that I often hear people concerned about - the combinatorial challenge of testing a large number of flags. I suspect this is because Uber have been feature flagging long enough to have developed good practices to mitigate this.
So, too many flags are bad - Uber should remove stale flags. However, identifying inactive flags is a surprisingly tricky problem. Some are "kill switches" which are almost never used, but need to be left available in prod. Others are in place for debugging or testing in prod.
as an aside, I think that a flag categorization scheme would help here, making it easier to distinguish between long-lived feature flags like these, vs short-lived Release or Experiment Toggles.

See martinfowler.com/articles/featu… for more discussion of this idea.
The paper also discusses the idea of requiring an expiration date when a flag is initially defined. This is another idea I've advocated for, and seen a few teams have some success with. Not a panacea though.
Lack of flag ownership looks to be a considerable problem at Uber. I'd think in part due to the nature of their org - hyper-growth, lots of churn and movement of folks. They also focus on ownership of flags by an individual dev, rather than a team. That seems questionable to me.
They also discuss a lack of aligned incentives when it comes to cleaning up feature flag debt - working on new features is rewarded more. I was very interested to hear that iOS devs have a more concrete incentive to remove stale flags - a limit from Apple on app download size.
Uber have used focused debt paydown efforts (e.g. "fixit weeks") to try and keep flag debt in check. I've heard of other orgs doing this too. It's been described to me as "the best tool we've found, but not a great tool" (paraphrasing a little here).
The stats in the paper about cleaned-up flags are interesting. It's not uncommon for a flag to be associated with over 100 lines of code, but flag-associated code tends to be restricted to a small number of files - 80% of flag-removal diffs involving 5 files or fewer.
This runs counter to the concerns I've heard from some folks getting started with feature flags that each new feature flag could lead to a bunch of conditionals strewn around a codebase.
Uber have added custom extensions to their automated testing frameworks to make it easier to declarative manage flag state in tests which are sensitive to that state - something I often see at orgs that are using flags heavily, but not something I've seen discussed much. 🤔
That wraps up what I found most interesting in this paper. Kudos to Uber for discussing this stuff so openly! 🙌

What other nuggets did *you* get from the paper?

Anyone have any other interesting reports from orgs talking about their usage of feature flagging in the wild?
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Pete Hodgson

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!