Tweet

Infinite State Machine

Follow @tPl0ch

May 24 • 26 tweets • 6 min read

https://twitter.com/mathiasverraes/status/1528756273012957184

I thought quite a bit about this extension of Conway's Law, which is taking the flexibility of the system under change into account. Here are some unsorted and maybe random thoughts from my experience of doing an inverse Conway maneuver on a larger scale at Flix. #SoftwareDesign

https://twitter.com/mathiasverraes/status/1528756273012957184

Some context: in 2018 we were taking extreme measures to get a grip on our monolithic software system that had been growing dramatically over 6 years, going from 8 to ~170 devs and multiple million LOC. And yes, an inverse Conway maneuver (ICM) is an extreme social intervention.

A lot of planning and preparation went into the execution: strategic system boundary design, stream alignment and platform capabilities around internal and customer facing products. The product owners (in the true sense of the word) prepared meaningful vision and mission docs.

We then facilitated what we called a 2-day self-design event in which we dissolved all existing team structures and let the people figure out themselves how to best distribute across the various products and subdomains. POs were giving deep presentations about the new products.

At the end we allowed the participants to assign themselves to the product teams, but there were constraints on min and max staffing and required roles and skills. Some areas were just not attractive enough for people to join, and others left due to the teams being dissolved.

Now we had a completely new social architecture with an old (bad) legacy software system, both completely unaligned with each other. The first thing we had to do is marking the new virtual boundaries within the code. We did this using CODEOWNER files in our VCS. But we not only..

..added newly responsible people, but also people that had experience with these parts of the code using git log and blame to bring new and old maintainers together in order to increase the pace of learning. Don't expect an ICM to come without cost, a lot of un- and re-learning.

Additionally we started to annotate all the database queries with the responsible team names in order to build a db table and column ownership map. Sometimes using an ORM gives you actual advantages, so we were able to pull this off without a huge performance overhead and effort.

@mathiasverraes

So now we had the old architecture marked with new boundaries, fresh ownership and old knowledge holders. And this is the point at which I take on @mathiasverraes opposite forces of rigidity / flexibility. The question after an ICM is how can we give the new structures autonomy?

How can we enable the new teams to make independent changes to code & data without affecting the other teams? The answers to this question define what "flexibility" means. The challenges on the way towards it define "rigidity". And as always that way is paved with trade-offs.

So what were some useful patterns to increase the teams' autonomy? And what were the trade-offs?

When fast team autonomy is the highest priority, then a "Copy & Change" (C&C) approach works very well. Copy the monolith and cut away everything not needed. Replicate the shared DB.

The trade-off with C&C is that the mew system is still based on the legacy model, distributed by replication. This might be desirable in situations that require independent scaling very fast since teams can now deploy and operate the new system independently.

This doesn't work when you need to change the business processes and the underlying model.

What worked well in our scenario was using scaffolding structures within the shared context, i.e. using the "Strangler Fig" pattern.

The trade-off here is time. A lot of it actually.

Adding scaffolding structures to a running monolith with dissolving boundaries is risky, so you need to be deliberate and go a bit slower. You'll also need time to discover and design the new models that drive the APIs - these will affect how the Strangler Facades are shaped.

Another pattern that I want to give a honorable mention is "Letting It Die". Often we realized that some features were maybe used once every three months by a handful of people and the value generated was almost non-existing. It's hard, but you should let go of these things.

Now that I am almost at the end of this thread I want to summarize some of the points:

1. ICMs are extreme social interventions that need a lot of strategic planning and preparation, otherwise you'll lose the support of the affected people.

2. The real work starts after the ICM with marking the new boundaries within the old system. Think of it as creating cut-marks, just like in a children's book.

3. Define the "flexibility" required for the new systems. Is it deployment independence or the ability to remodel?

4. Apply suitable patterns to overcome the "rigidity" within the existing systems. Be aware that applying some of these patterns will take quite some time to introduce, so make sure the expected returns justify the endeavor.

And that's it folks. I hope you enjoyed this thread.

P.S. there is probably a lot more that can be said about all these topics. What I wanted to stress is that just doing an ICM and expecting the software systems to magically transform themselves without putting in all the work I described will most probably result in failure.

Addendum: The Copy & Change pattern, while giving teams fast autonomy, is not a real short-cut, but rather taking on debt on future flexibility. Most likely you'll still want to move away from the old model after some time, because it will constrain you in the mid- and long term.

I can't stress enough: you should expect to lose some of your employees during this social earthquake. Many were happy in the teams that existed before the ICM, and won't find the energy to go through another forming phase. That's OK! Be transparent about this in 1:1s beforehand!

https://twitter.com/tPl0ch/status/1528924678127075333

Re:

https://twitter.com/tPl0ch/status/1528924678127075333

Another point that has been oversimplified:

Often ambiguity in models required multiple teams to move in lock-step with each other, at least until the discovery of the models that disambiguated the existing structures and an agreement on transition.

https://twitter.com/tPl0ch/status/1528924666332647425

I also made it sound that the new boundaries were fixed and immutable. But rather you should build evolvability in - by defining light weight processes that easily allow formation of new structures bottom-up using a funding approach: 3 years / 1.8 mil €

https://twitter.com/tPl0ch/status/1528924666332647425

Of course the funding will need to go through an approval process, and you'll need to provide an actual business case that you need to defend in front of senior leadership and C-level. But once approved, the new team can leverage support from generic functions like HR & Finance.

https://twitter.com/tPl0ch/status/1528924664109596672

WARNING! You'll need PROPER observability. If you don't have proper introspection, the technical challenges will be almost impossible to overcome. In the mentioned example we were able to trace the annotated db queries through the complete code artifact.

https://twitter.com/tPl0ch/status/1528924664109596672

These traces extremely helped the teams to identify parts of the code that were triggering multiple independent use cases now owned by different teams. I'd even go as far as saying you shouldn't even try this maneuver without a proper integration of an observability platform.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Infinite State Machine

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @tPl0ch

Infinite State Machine

Infinite State Machine

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?