So much of my timeline talking about SolarWinds and so little of my timeline talking about how to properly harden your CI infra to make that kind of attack more difficult to pull off.
I'll put some things that I consider good ideas in this thread here. I don't know your environment, so I don't have any good advice of what'll work best for you.
Build systems that hash their inputs to derive the name of the resulting output and cache results in content-addressable storage (CAS) take a little effort to understand, but it's such a powerful security idea:

geekinasuit.com/2019/02/bazely…
Multi-tenant CI systems are the devil. We have better orchestration systems now, so spinning up fresh VMs, micro-VMs, cloud instances, even containers is so much easier than it used to be. Embrace this and externalize anything cached between builds (ideally in a CAS if you can).
Start with understanding that CI is RCE-as-a-service by design. That principle that will help you understand its security and lack thereof. The closer you get to a 1:1 mapping between a hash of all inputs (maybe a git SHA) and resulting artifact hash, the better security will be.
Here is an interesting tidbit from CrowdStrike's blog post (crowdstrike.com/blog/sunspot-m…) on the SUNSPOT malware that impacted SolarWinds:

"Persistence using scheduled tasks, triggered at boot time"

Why can build nodes reboot without being destroyed? Or did attackers pwn OS image?
So *absolutely* before any of the fancy CAS stuff I mentioned, start with enforcing a maximum time-to-live for build nodes and have new nodes created from a known-good immutable image (e.g. latest AMI, etc) and install latest OS updates at boot (default for Amazon Linux AMIs).
Here's the absolutely easy and lazy way: configure an EC2 Auto-Scaling Group (docs.aws.amazon.com/autoscaling/ec…) for your build nodes that scales down to zero. There ya go, you don't even need to do anything to kill your nodes now.
In case anyone was wondering, yes I did set all of this up years ago. It took me a few hours on a Saturday afternoon (so not as much need for CI) and I even had time to look at spot price history to tweak the spot price and machine type to save us mad $$$.
Oh yeah, you also need to define a lambda that is called by your build orchestrator to trigger that EC2 ASG to scale up from zero.

This buildkite AWS Cloud Formation stack does it very well:

github.com/buildkite/elas…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Dino A. Dai Zovi

Dino A. Dai Zovi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @dinodaizovi

24 Jul 20
A thread on security culture anti-patterns that I've seen first-hand over the last 25 years that I've been in charge of security for one thing or another.

My thesis: the farther security decisions are made from functional and operational concerns, the worse all three become.
We'll start with my high school's Linux server that was used for our Adventures in Supercomputing program. I was doing independent study in that class, it got hacked, I could show the teacher how, so it became my job to run it. I was in charge of keeping it usable *and* secure.
I had to lock things down but I had to keep it usable for students to *telnet* in and develop their fortran projects. If it broke, it was my job to fix it. If it got broken into again, it was also my job to recover from it. I owned resilience: uptime, backups, and security.
Read 11 tweets
11 Apr 20
One of the greatest superpowers is the right shortcut to thinking. One of the greatest weaknesses is the wrong shortcut to thinking.

Here is a good reminder from @farnamstreet on why "The Map is Not the Territory":

fs.blog/2015/11/map-an…
An example from @nntaleb:

"There is an internal contradiction between measuring risk (i.e. standard deviation) and using a tool [VAR] with a higher standard error than that of the measure itself. [...] The risk management objective function is survival, not profits and losses."
In a former life, I assessed all security risks to the firm where I was head of security using CVSS. It prioritized my work and helped make nice charts about how much risk I reduced each quarter. The model didn't properly capture the biggest security risks and discovery suffered.
Read 5 tweets
4 Oct 19
1/n: There is a lot of risk in layering disparate security models because they often leave exploitable gaps at the seams. In the cloud, when you run k8s in the cloud, you are layering many security models: IAM, k8s RBAC, k8s Pod "sandbox", Linux containers, Unix user/groups, etc.
2/n: When I kick the tires on k8s clusters, I go straight for the seam between k8s and the cloud IAM permissions for maximal privilege escalation. This usually gives you access to powerful IAM roles in large, shared cloud accounts. It's a high-risk, large blast radius design.
3/n: Compare to the approach of baking app+OS into a single immutable AMI and embracing IAM security model with roles, sub-accounts/projects. App and OS vulns are roughly equivalent because they can only escalate to app's IAM role. If you use sub-accounts per app, even better.
Read 6 tweets
28 Sep 19
1/n: My rant about the new Checkm8 BootROM exploit and what it means for security of iOS devices.
2/n: it is super cool technically and I’m looking forward to playing with it on my older iOS devices.
3/n: There is a world of difference in the security of iOS-based devices between the last public BootROM exploit (limera1n) and now due to the introduction of the Secure Enclave. With limera1n, you could boot a ramdisk and brute force a 4-digit PIM in roughly 18 minutes.
Read 8 tweets
11 Aug 19
My #blackhat keynote () in a tweet thread.

I spent years focusing on the technical offense: red teaming, pen-testing, and security research. I felt that it wasn’t having enough impact, so pivoted to defensive security engineering.

I learned 3 key lessons:
1. We should reverse engineer our “jobs to be done” by talking to our internal “customers” and understanding their struggle. Every security role can benefit from more customer orientation and understanding of those impacted by our work.
2. Seeking and applying leverage through better feedback loops and delivering software will help us better scale to meet our challenges. Software and data science are force multipliers that we should all strive to fully embrace.
Read 5 tweets
7 Aug 19
There a few talks that I wanted to highlight this year at @BlackHatEvents, and they just happen to be the ones that I’m most excited about seeing.
First off, this talk by George Williams, Jonathan Saunders, and Alex Comerford on detecting deep fakes is really important and I missed it unfortunately:

blackhat.com/us-19/briefing…
I’m also excited to see some great Kubernetes pwnage from @IanColdwater and Duffey Cooley:

blackhat.com/us-19/briefing…
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!