A #SOC analyst picks up an alert and decides not to work it.
In queuing theory, this is called “work rejection”–and it’s quite common in a SOC.
TL;DR - “Work rejection” is not always bad, but it can be measured and the data can help improve performance. More details in the 🧵..
A couple of work rejection plays out in the SOC. The most common:
An analyst picks up an alert that detects a *ton* of benign activity. Around the same time, an alert enters the queue that almost *always* finds evil. A decision is made...
The analyst rejects the alert that is likely benign for an alert that is more likely to find evil.
Let’s assume the analyst made the right call. They rejected an alert that was likely benign to work an alert that was evil. Work rejection resulted in effective #SOC performance.
Another common scenario:
An analyst picks up an alert and doesn’t know what the alert means or what to do. Perhaps the alert doesn’t contain enough evidence of decision support. A decision is made...
The analyst “rejects the work” and moves on to the next alert hoping the alert conditions are more favorable (e.g, “I can handle the *next* one”).
By looking at the patterns and trends of “work rejection” in the SOC you can spot ways to improve performance.
One way we do this @ExpelSecurity is for every alert sent to the SOC we record how many analysts picked up the alert before a decision is made. This allows us to determine how often the alert, aka the “work”, is rejected before completion.
There are two main benefits:
1. The first benefit is you’ll see which classes of alerts newer analysts “reject” most often. Armed with this context you can modify training material or deploy more decision support to improve performance and reduce work rejection.
2. The second benefit is you'll see which classes of alerts are “rejected” by everyone. Armed with this context you can make adjustments to reduce work rejection (demote severity, add evidence/enrichment/orchestration).
Recap:
- “Work rejection” in a SOC isn’t always bad. There are cases where rejecting one alert for another results in effective SOC performance.
- “Work rejection” can be measured and the data can help shine a light on areas where you can improve performance.
Recap (cont'd):
- Remember, “work rejection” happens before the triage decision. So, an alert with a high benign rate + a high “work rejection” rate is likely ripe for optimization.
- “Work rejection” rates help you understand where analysts are confused or need a bit more help
• • •
Missing some Tweet in this thread? You can try to
force a refresh
What does a #SOC tour look like when the team is remote?
TL;DR - Not a trip to a room with blinky lights - but instead a discussion about mission, mindset, ops mgmt, results and a demo of the tech and process that make our SOC “Go”.
SOC tour in the 🧵...
Our SOC tour starts with a discussion about mission. I believe a key ingredient to high performing teams is a clear purpose and “Why”.
What’s our mission? It's to protect our customers and help them improve.
Our mission is deliberately centered around problem solving and being a strategic partner for our customers. Notice that there are zero mentions of looking at as many security blinky lights as possible. That’s intentional.
A good detection includes:
- Clear aim (e.g, remote process exec on DC)
- Unlocks end-to-end workflow (not just alert)
- Automation to improve decision quality
- Response (hint: not always contain host)
- Volume/work time calcs
- Able to answer, “where does efficacy need to be?”
On detection efficacy:
⁃ As your True Positive Rate (TPR) moves higher, your False Negative Rate moves with it
⁃ Our over arching detection efficacy goal will never be 100% TPR (facts)
⁃ However, TPR targets are diff based on classes of detections and alert severities
Math tells us there is a sweet spot between combating alert fatigue and controlling false negatives. Ours kind of looks like a ROC curve.
This measure becomes the over arching target for detection efficacy.
“Detection efficacy is an algebra problem not calculus.” - Matt B.
Before we hired our first #SOC analyst or triaged our first alert, we defined where we wanted to get to; what great looked like.
Here’s [some] of what we wrote:
We believe that a highly effective SOC:
1. leads with tech; doesn’t solve issues w/ sticky notes 2. automates repetitive tasks 3. responds and contains incidents before damage 4. has a firm handle on capacity v. loading 5. is able to answer, “are we getting better, or worse?”
How to think about presenting good security metrics:
- Anchor your audience (why are these metrics important?)
- Make multiple passes with increasing detail
- Focus on structures and functions
- Ensure your audience leaves w/ meaning
Don’t read a graph, tell a story
Ex ⬇️
*Anchor your audience 1/4*
Effective leaders have a firm handle on SOC analyst capacity vs. how much work shows up. To stay ahead, one measurement we analyze is a time series of alerts sent to our SOC.
*Anchor your audience 2/4*
This is a graph of the raw trend of unique alerts sent to our SOC for review between Nov 1, 2021 and Jan 2, 2022. This time period includes two major holidays so we’ll expect some seasonality to show up around these dates.
Once a month we get in front of our exec/senior leadership team and talk about #SOC performance relative to our business goals (grow ARR, retain customers, improve gross margin).
A 🧵on how we translate business objectives to SOC metrics.
As a business we want to grow Annual Recurring Revenue (ARR), retain and grow our customers (Net Revenue Retention - NRR) and improve gross margin (net sales minus the cost of services sold). There are others but for this thread we'll focus on ARR, NRR, and gross margin.
/1
I think about growing ARR as the ability to process more work. It's more inputs. Do we have #SOC capacity available backed by the right combo of tech/people/process to service more work?
Things that feed more work: new customers, cross selling, new product launches.
/2
Purpose: Be clear with your team about what success looks like - and create a team and culture that guides you there. Go through the exercise of articulating your teams purpose.
The "purpose" we've aligned on at Expel in our SOC: protect our customers and help them improve.
People: To get to where you want to go, what are the traits, skills, and experiences you need to be successful?
Traits (who you are)
Skills (what you know)
Experiences (what you've encountered/accomplished)