What does a #SOC tour look like when the team is remote?
TL;DR - Not a trip to a room with blinky lights - but instead a discussion about mission, mindset, ops mgmt, results and a demo of the tech and process that make our SOC “Go”.
SOC tour in the 🧵...
Our SOC tour starts with a discussion about mission. I believe a key ingredient to high performing teams is a clear purpose and “Why”.
What’s our mission? It's to protect our customers and help them improve.
Our mission is deliberately centered around problem solving and being a strategic partner for our customers. Notice that there are zero mentions of looking at as many security blinky lights as possible. That’s intentional.
Next, we talk about culture and guiding principles - key ingredients for any #SOC. I think about culture as the behaviors and beliefs that exist when management isn’t in the room.
Culture isn't memes on a slide - it's behavior and mindset.
Next, with a clear mission and mindset - how are we organized as a team get there? Less experienced analysts are backed by seasoned responders. If there's a runaway alert (it happens), there's a team of D&R Engineers monitoring the situation ready to respond.
The tour then focuses on operations management and how we do this for a living. You have to have intimate knowledge of what your system looks like so you know when something requires attention. Is it a rattle in the system (transient issue) or a big shift in work volume?
With solid ops management we’re able to constantly learn from our analysts and optimize for the decision moment. We watch patterns and make changes to reduce manual effort. We hand off repetitive tasks to bots because automation unlocks fast and accurate decisions.
Next, our tour focuses on how we think about investigations. Great investigations are stories (based on evidence of course).
When we identify an incident we investigate to determine what happened, when, how it got there, and what we need to do about it. Stories.
Next, how we think about quality control in our #SOC. We make a couple key points:
1. We don’t trade quality for efficiency 2. You can measure quality in a SOC 3. QC checks run daily based on a set of manufacturing ISOs to spot failures to drive improvements
- We’re going to use industry standards to sample
- The sample has to be representative of the population and done daily
- Measurements of the sample need to be accurate and precise
- Metrics we produce need to be digestible
What about #SOC results? Let's talk about it. Yes, alert-to-fix in <30 minutes is quite good. But a high-degree of automation and SOC retention are equally important.
Before the tour ends we share insights. The security incidents we detect become insights for every customer.
“Identity is the new endpoint”, a lot of BEC in M365 and MFA fatigue attacks are up.
Then we jump into our platform and provide a demo of the tech and process that enable the #SOC to complete their mission. Here's a video capturing some of the items we cover: expel.com/managed-securi…
Finally, we stop by the #SOC. Most of our analysts will be remote - but a tour is about so much more than seeing a room with monitors and blinky lights. I believe a great SOC tour highlights the people, culture and mindset behind tech and process.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Top 3 #M365 Account Takeover (ATO) actions spotted by our SOC in Q1:
1. New-inbox rule creation to hide attacker emails 2. Register new MFA device for persistence 3. Create mailbox forwarding rules to monitor victim comms and intercept sensitive info
More details in 🧵...
50% of ATO activity in M365 we identified was for New-inbox rules created by an attacker to automatically delete certain emails from a compromised account. By deleting specific emails, an attacker can reduce the chance of the victim or email admins spotting unusual activity.
25% percent of ATO activity we identified was for the registration of a new MFA device in Azure. Registering a new MFA device allows an attacker to maintain persistence.
We're seeing more and more M365 session cookie theft for initial access....
By deleting specific emails, an attacker can reduce the chance of the victim spotting unusual activity.
You can build high quality detections to spot this activity. A 🧵with real-world examples...
Account takeover (ATO) activity in M365 can involve various unauthorized actions performed by an attacker who has gained control over the account.
Of the ATO activity we identified in M365 in Q1 '23:
50% of all ATO in M365 we identified was for New-inbox rules created by the attacker to automatically delete or hide certain emails from the compromised account.
A #SOC analyst picks up an alert and decides not to work it.
In queuing theory, this is called “work rejection”–and it’s quite common in a SOC.
TL;DR - “Work rejection” is not always bad, but it can be measured and the data can help improve performance. More details in the 🧵..
A couple of work rejection plays out in the SOC. The most common:
An analyst picks up an alert that detects a *ton* of benign activity. Around the same time, an alert enters the queue that almost *always* finds evil. A decision is made...
The analyst rejects the alert that is likely benign for an alert that is more likely to find evil.
Let’s assume the analyst made the right call. They rejected an alert that was likely benign to work an alert that was evil. Work rejection resulted in effective #SOC performance.
A good detection includes:
- Clear aim (e.g, remote process exec on DC)
- Unlocks end-to-end workflow (not just alert)
- Automation to improve decision quality
- Response (hint: not always contain host)
- Volume/work time calcs
- Able to answer, “where does efficacy need to be?”
On detection efficacy:
⁃ As your True Positive Rate (TPR) moves higher, your False Negative Rate moves with it
⁃ Our over arching detection efficacy goal will never be 100% TPR (facts)
⁃ However, TPR targets are diff based on classes of detections and alert severities
Math tells us there is a sweet spot between combating alert fatigue and controlling false negatives. Ours kind of looks like a ROC curve.
This measure becomes the over arching target for detection efficacy.
“Detection efficacy is an algebra problem not calculus.” - Matt B.
Before we hired our first #SOC analyst or triaged our first alert, we defined where we wanted to get to; what great looked like.
Here’s [some] of what we wrote:
We believe that a highly effective SOC:
1. leads with tech; doesn’t solve issues w/ sticky notes 2. automates repetitive tasks 3. responds and contains incidents before damage 4. has a firm handle on capacity v. loading 5. is able to answer, “are we getting better, or worse?”
How to think about presenting good security metrics:
- Anchor your audience (why are these metrics important?)
- Make multiple passes with increasing detail
- Focus on structures and functions
- Ensure your audience leaves w/ meaning
Don’t read a graph, tell a story
Ex ⬇️
*Anchor your audience 1/4*
Effective leaders have a firm handle on SOC analyst capacity vs. how much work shows up. To stay ahead, one measurement we analyze is a time series of alerts sent to our SOC.
*Anchor your audience 2/4*
This is a graph of the raw trend of unique alerts sent to our SOC for review between Nov 1, 2021 and Jan 2, 2022. This time period includes two major holidays so we’ll expect some seasonality to show up around these dates.