Follow @SteveSmithCD

12,399 views

Steve Smith

Follow @SteveSmithCD

, 31 tweets, 8 min read

My Authors

@clare_liguori

@clare_liguori

A deep dive into @clare_liguori's great article on #ContinuousDelivery at @awscloud... 1/n

https://twitter.com/clare_liguori/status/1273695334066057216

@clare_liguori

@clare_liguori

Firstly, thanks to @clare_liguori for doing this. As someone with deep knowledge of deployment pipeline design and shallow knowledge of AWS, this is really interesting. I wish more companies did this 🙇‍♂️ 2/n

As someone who's been thinking about deployment pipeline designs for over a decade e.g. continuousdeliveryconsulting.com/blog/deploymen… and vimeo.com/370035221, it's always interesting to hear how orgs do deployment pipelines a) at scale and b) in unusual market conditions. AWS is a) and b) 3/n

"How often do you deploy to production" is an excellent interview question. The answer will tell you a lot

Other questions I like to ask: where does work come from, who monitors and acts upon unexpected unavailability, and how does the organisation learn from incidents 4/n

"How much time do you usually spend shepherding deployments" is pretty good too. An answer of "none" is exciting. "Lots" is OK, compared to "Dunno" 5/n

"I’m able to trust that the pipeline will cautiously and safely deploy my change" is really powerful. If you can't trust an automated process, it needs more work. Otherwise you haven't really freed up people for higher-value activities 6/n

Storing app code, operator code, infra code, library versions etc. in separate version controlled repositories would be my choice too. That has interesting dependency mgmt trade-offs, which have been solved by an auto weekly update - wow 7/n

Version controlling feature flag config and using auto rollbacks on deploy incompatibilities is a good idea. It's a big investment, that will pay off at scale 8/n

More could be said on the benefits of separate pipelines for separate aspects of a service. Being able to independently deploy code, config, infra can massively accelerate deployment frequency 9/n

It's interesting to learn AWS is using GitHub Flow (bad name for short lived branches) with mandatory code review pre-merge. I'd personally advocate TBD with pair programming and post-commit review if necessary, but I lack a lot of context 10/n

GitHub Flow can work well. It can also go horribly wrong infoq.com/presentations/…. It's interesting that AWS use it even with a high standard of deployment pipeline and auto rollbacks

If I was coding an AWS service and I was on-call, I'd want a pair or a code review for sure 11/n

It's awesome to see a code review checklist focussed on testing, monitoring, and incremental rollout. Code review shouldn't be about curly braces. It should be about a high cross-team standard of deployability, testability, and operability 12/n

There is little or no mention of XP practices such as TDD and CI. I assume AWS allows teams to choose for themselves - as happens with tool selections. That dovetails with the Accelerate book describing team empowerment as a leading indicator of #ContinuousDelivery 13/n

"All builds run without network access to isolate the builds and encourage build reproducibility" is one of those little touches I always recommend 👍. It's a great way to ensure unit tests are actually unit tests. A pipeline should be opinionated and an enabling constraint 14/n

The emphasis on dependency unit testing with mocks prior to dependency integration testing is excellent. An integration test is an end-to-end test. It should be used as a happy path tracer bullet, and should not include corner cases. Otherwise pipeline lead times will suck 15/n

@clare_liguori

@clare_liguori

Mid-thread update! @clare_liguori has kindly pointed I can't competently read at 0200. AWS does TBD at scale, with VCS magic to mandate pre-mainline code reviews. That's awesome, particularly with the operability emphasis on code review 👍 16/n

https://twitter.com/clare_liguori/status/1275954684331569153

I'm surprised there are as many as 3 test environments, and delighted the Dev/Test/Staging antipattern isn't mentioned. It's not clear if all teams use the same test envs, or if Alpha/Beta/Gamma are the actual env names 🤔 17/n

It would be interesting to see activity timings. If a pipeline is fully automated there will no queue times between activities 👍. There will be one task that takes longer than desired - perhaps integration testing - but if it's not the constraint, it's fine 18/n

The integration testing approach sounds to be excellent. I'd be curious to learn how often those tests catch errors that would not caught by dependency mocks in the unit tests... and how often integration tests are deleted 19/n

I'd call the "one box" approach in the Gamma test env a variant of Canary Deployments martinfowler.com/bliki/CanaryRe…, and it sounds to be done very well

I'd refer to the (very good) continuous canary testing as Smoke Testing, but the lines are blurred these days 20/n

I was wondering why integration testing is in 2 test envs, and then:

"Microservices in pre-production environments typically call the production endpoint of any services owned by another team"

If you have to do some end-to-end testing, do it for real wherever possible 👍 21/n

Another nugget:

"Gamma is also deployed in multiple AWS Regions to catch any potential impact from regional differences"

A substantial engineering effort that's well worth it. If your services are multi-region, test in multi-region too 22/n

Staging production deployments as per-AZ, per-region is a sophisticated method of Canary Deployments. There will be some tricky trade-offs on smoke testing and queue times between AZs and regions 23/n

Deployment waves is an interesting way to tackle Canary Deployment trade-offs at scale. A visualisation would help. I'm not sure if regions are grouped into waves for 1 deploys from 1 team, or if N deploys from N teams go together 24/n

Validating deployments with ongoing prod telemetry rather than eyeballs is a good idea, that is woefully under-utilised in other orgs 25/n

Using a bake time per canary, and modifying bake time per deployment wave, are really good ideas

*If* a pipeline becomes a constraint, bake time could be the limiting factor. But it's about the audacity of resilience... prepping for unknown unknowns 26/n

The deployment blockers on local alerts, deployment time windows, and org-wide alerts sound fantastic. That level of automation will make it much easier to have focussed conversations, experimentation with time windows, etc. 27/n

@clare_liguori

@clare_liguori

Also, @clare_liguori kindly answered some questions... like manual approval gates can be added for UIs in Gamma. That's a sensible, pragmatic choice. I wonder if it can be configured per-UI per-feature not just per-UI 28/n

https://twitter.com/clare_liguori/status/1274106358036496385

https://twitter.com/clare_liguori/status/1273721588387471361

https://twitter.com/clare_liguori/status/1273721588387471361

There's a big emphasis on engineering APIs for backwards compatibility. That's a key tenet of Trunk Based Development, adaptive architecture, minimal blast radius on failure... it's super important 29/n

https://twitter.com/clare_liguori/status/1273721588387471361

https://twitter.com/clare_liguori/status/1274345468890189824

https://twitter.com/clare_liguori/status/1274345468890189824

There's an emphasis on Stop The Line, which is a key #ContinuousDelivery practice that is often overlooked. If there's a failure in a test env or prod, new feature work and commits *stop* and the team swarms to fix the problem together 👏 30/n

https://twitter.com/clare_liguori/status/1274345468890189824

@clare_liguori

@clare_liguori

Thanks for taking the time to do this @clare_liguori, it must have been a big effort and I hope it was in office hours! So much to learn from here, and share with other orgs 🙇‍♂️ /end

Try unrolling a thread yourself!

Related hashtags

More from @SteveSmithCD see all

Embed code for your website

Did Thread Reader help you today?