12,399 views

Jon Hencinski

@jhencinski

, 10 tweets, 5 min read

My Authors

How do you measure #SOC quality? 🤔

1. ISO 2859-1 (#AQL) to determine sample size
2. #Python #Jupyter notebook to perform random selection
3. Check sheet to spot defects
4. Process runs every 24 hrs
5. (Digestible) #Metrics to improve

How'd we get there? Story in /thread

I'll break the thread down into four key points:

1. What we're solving for
2. Guiding principles
3. Our (current) solution
4. Quick recap

My goal is to share what's working for us and how we get there. But I'd love to hear from others. What's working for you?

What we're solving for: All work is high quality, not just incidents.

On a typical day in our #SOC we'll:
- Process Ms of alerts w/ detection engine
- Send 100s to analysts for human judgement

Those 100s of alerts result in:
- Tens of investigations
- Handful of incidents

My mental model for #SOC QC is two key activities:

1. QA | Focus: *Prevent* defects | Ex: Email notifications for those really spooky alerts

2. QC | Focus: *Find* defects | Ex: Let's review closed alerts

You likely already have a *ton* of QA built in.

But is there any QC?

OK, understand the problem.

What are the #SOC QC guiding principles?

1. We'll use industry standards to sample
2. The sample has to be representative of the total population
3. Measurements must be accurate & precise
4. Metrics we produce are digestible
5. Performed daily

What next?

We went out and researched QC in manufacturing and landed on ISO 2859-1.

TL;DR ➡️ You make things (your lot), AQL tells you have many you should inspect.

Let's say your team handles 600 alerts per day (lot size).

You should inspect 32 (sample size).

Next, we broke our production/work into three lots:

1. Alerts
2. Investigations
3. Incidents

We used change point analyst to determine the mean of each and then used AQL tables to tell me how many we should inspect each day.

Cool, cue the #Jupyter Notebook.

Now what?

We take each item through a check sheet and look for defects. Did we take the right action? Did we zig when we should have zagged type of thing.

We record the number of defects by type each day, trend them and then provide feedback to the team via #Slack workflow.

The whole point is to understand how you're doing and spot ways to improve

SOC #QC wins:
- Spotted issues using a class of tech ➡️ held training
- Variance wrt how we investigated auth alerts ➡️ built orchestration
- Wobble w/ reporting quality ➡️ built tech

How do others measure #SOC quality?

I'd love to hear about your quality program. What works? What didn't? Success stories? We're always on the lookout for ways to improve.

Also, if you've made it this far in the thread, thanks for taking the time!

Enjoying this thread?

Try unrolling a thread yourself!

Enjoying this thread?

Try unrolling a thread yourself!

Related hashtags

More from @jhencinski see all

Embed code for your website

Did Thread Reader help you today?