SwiftOnSecurity Profile picture
Dec 13, 2023 28 tweets 6 min read Read on X
Today at work there was a 11hr outage bridge call over a small but important area of the Windows network having application errors. Not debilitating but would be eventually.

Here is how I helped solve it after being asked to take a look, and how I approach problems like this. 🧵
I’m an IT Generalist who started in Helpdesk and system engineering. I now work in Security, using those skills to push initiatives forward. Often, to troubleshoot complaints and impediments these projects encounter. I talk with lots of teams, so I’m periodically asked to consult
Today’s issue was Microsoft Edge launching but all functionality not working except the basic chrome. The settings tab would open but also show an error. F12 dev tools would not open. Strangely, an IE Mode tab would work.

I am presented this problem. I start to dig in.
This problem occurred progressively to 100 machines seemingly all in the same OU. It was not strictly Edge version-dependent from site staff. Interesting data-point. Heterogeneity in a homogenous managed network. Take note.

No changes happened. Moving machines to new OU no help.
Site staff remote into an impacted computer to share screen with me. Something I always do is get hands-on to get a feeling for the machine. Vibes are important.

Start Menu doesn’t work? Staff says it’s restricted. I right-click to get admin PowerShell prompt. Auth with LAPS.
(They showed me issue, described earlier)
I launch compmgmt.msc. Say I’m just looking around. Weird crashes. ShellExperienceHost.exe.
Start Menu is not restricted. It’s broken somehow. “How long like this?” Years.
But that’s not the problem I’m called for.

Maybe. I’m suspicious.
Edge is failing in an extremely unusual way it’s not even triggering error handling for. Unusual. Nothing other staff had tried or guessed at with multiple departments giving input has assisted.

So this has to be really freaking weird. Maybe.

I try DISM/SFC. Nothing as expected
I look at the Group Policy applied to this OU. Lots of super old stuff obviously ported-forward across maybe 20 years. Nothing immediately obviously linked, but the problem seems to be OU-dependent.

No GPO edits last year in structure. Dates not absolute but reliable for now.
Site staff can replicate issue. Image machine, install LOB apps, works fine. Put it in OU. It breaks. But nothing has changed, probably?

I ask SCCM/others etc about pushes. Absolutely nothing. This network area is tagged as a critical change control zone.

Broken anyway. Exotic.
Seeking variables I disable all non-Microsoft services. Inspect all startup/login scripts. No changes or help.

Start Menu not working bugs me. Say it’s normal but it’s not. Could this be a clue I can key off of? Normal diagnostics to fix it don’t help at all. I explain to call:
You have two very unusual issues on a machine. I do not care one of them is old. Something with a Windows subsystem is potentially a fault as a root cause here. We are going to address Start Menu. Not Edge. Lots more advice on Start Menu being a symptom to find resolutions anyway
I ask tech to transfer ProcMon and AutoRuns to system via USB, which they can access logged in with personal service account bypassing USB control. These machines cannot access most of network like SMB shares. Very unusual for changes to penetrate them. They are really obtuse.
Go through AutoRuns, nothing notable.
I launch ProcMon and immediately try Start Menu, then pause capture. I look for what happens to ShellExperienceHost RIGHT before WerFault.exe is called. This is not reliable for issue identification but can give hints.

Failed registry reads?
The process is unable to read (for example) HKLM\Software\Microsoft\OLE. Really basic eternal low-level stuff literally -anyone- should be able to access.

ProcMon results have lots of nuance in what you think they say, but I check regedit. Fine. Check effective access. Fine.
Is this red herring? Literally the _last_ thing that occurs before process crash, not a cleanup routine. This could be important.

I go back to check my assumptions and paths of causality. It _has_ to be a GPO. Got more troubleshooting results that confirm it’s OU. But what?
This stage of troubleshooting requires synthesizing several areas of windows internals knowledge.

1.) The GPO DOES manage Registry permissions to HKLM\Software. But it has admins, users, system, etc all as expected. Should be zero limitations
2.) Managing registry ACLs really sucks, and most do it through a crappy ancient editor. The goal of some IT tech 15+ years ago was to grant FULL CONTROL rights to the registry keys of an LOB app in HKLM

3.) This app is likely from XP era when apps assumed they were always admin
4.) Registry ACL edits in Group Policy are “tattoo” operations. They do not get reverted when you move the machine out of an OU. This would explain issue persistence even moving out of OU. This and the fact being in OU once ever breaks Start Menu forever, aligns.
5.) The fact this tattoo operation is happening in the same OU as Start Menu and Edge breaking forever is likely not coincidence.

6.) Windows8 introduced AppX subsystem toisolate “modern” Windows applications from running literally as the user principal.
7.) Start Menu is NOT accessing the Registry as the user! It accesses it under the “ALL APPLICATION PACKAGES”

8.) The GPO is overwriting the HKLM\Software ACL with principals from years ago probably Win7, BEFORE this existed! Image
So we have theory on Start Menu:

IT tech 15 years ago hard-coded an ACL in Group Policy which does not include modern Windows principals.

This breaks Start Menu. Site staff thought this was a security limitation. It’s not. It was a technical error. I was the first to Q?

But..
The problem is with Edge. This cannot be related. It’s been like this for years. It’s not related to Edge version.

BUT there is ONE variable you are not considering.

Some modern software is not controlled by version. It gets feature testing flags iteratively rolled out. How?
This is controlled in Edge by the Experimentation Service policy. I presume this is the cause, but didn’t prove it tonight.
Microsoft is probably testing some security hardening setting that leverages AppX isolation?



Back to thread and chosen resolutionadmx.help/?Category=Edge…
Again, managing registry ACLs sucks. It’s possible but for critical devices and without 3rd party tools like SetACL I needed a proven solution.

So use Group Policy to set “HKLM\Software” ACL to grant “ALL APPLICATION PACKAGES” query, enumerate, notify, and read ACL inheritably.
We do this in a test OU. Move broken machine in. gpupdate /force and reboot.

It comes up.

⚠️🚨⚠️🚨⚠️

The Start Menu works.

SO DOES EDGE. The line-of-business application server page is loaded instantly for <redacted>.

Holy shit. Found the cause and a countermeasure. ~45min.
This is NOT end. It has to be rolled out under change control iteratively in batches.

AND it’s not clean. This is a countermeasure. ANY hardcoding of ACLs in a machine is a terrible fucking idea. As we have proven today. Machines are stained.

When done this GPO gets replaced.
Aftermath:

This is not an unknown problem. In further research, Microsoft is aware of the legacy a bad Registry management wizard 20 years has had on the machines of today.

Permissions hardcoded in Group Policy in a different era have ongoing detriment.

learn.microsoft.com/en-us/troubles…
NOTE: On consult with Microsoft staff about the experiments policy, that may not be a factor here. It was NOT proven. Tech was already on a 12hr day.
We know of this policy and do NOT use it on employee machines on purpose, so Microsoft can see small impacts. Do not use blindly.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with SwiftOnSecurity

SwiftOnSecurity Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @SwiftOnSecurity

Feb 18
The thing about Active Directory, is you can't understand any of it unless you begin from the past before it. You cannot examine it from the future. You will get only nonsensicals.
And that's really where most commentators fail. They don't know why. Because there is a reason.
The reasons Active Directory fails is deeper than technology. It is from inception, to ironically be more open than you conceive. It is the sourcing of philosophy in staff whose only job was one portion. Whose users, absolute experts. Whose salary paid one. This... didn't happen.
Active Directory is truly beautiful. But it's a beauty you can only experience in the world it was envisioned for. Outside, it is a horror of hacks trying to address things you can only ascribe hate. Decades later. But trust me, it is beautiful. I wish you could see it, how I do.
Read 9 tweets
Feb 15
I live on a secluded area of my street with little traffic but I purposefully make it evident my surveillance and you know what every dog walker picks up their poop. Image
👏Always👏be👏engineering👏perception👏

Even on gate I don't lock I have a fake one that makes it appear always padlocked. I have spike strips that are just plastic on areas you could boost over my fence.
I do the same thing in enterprise security. We appear to have three different top-tier antivirus, running on a malware analysis VM, with debug tools running, and more traces like that.

This is your playground they're in and stop denying yourself the freedom to fake it.
Read 9 tweets
Feb 12
One of most interesting artifacts of Windows was in Vista when they laid out their most optimistic dreams of how what they would be built would be used. A real tragedy, writing how they hoped troubleshooting framework would be adopted in proactive remediation. It was just killed. Image
Windows has only had a few true revolutions. 95, NT, 2000 Server (Active Directory), XP, Vista, and 8.
Windows 7, Windows 10, they are the inheritors of surviving the revolution. They are the good times. Unfortunately I don't know what Windows 11 is.
What the common person doesn't understand is that Windows is the only OS on Earth that does what it does. The support matrix for Windows 10 is the most profound and mathematically extreme in human history.
Windows 11 was a hard-cut. A cruel one. One you'd never understand why.
Read 8 tweets
Jan 11
==Training Lesson==
INVESTIGATION NARRATIVE: SSH Kill la Killed 🧵

My job is to solve the Weird Problems as the Final escalation tier. I do this with generalist knowledge and practical experience.

New InfoSec/IT entrants often ask what this looks like in practice. Follow below. Image
NOTE: You can mute this thread if not interested it will be long.

I have a seedbox in Europe to coalesse torrent downloads from other servers at 10gbe uplink to many other similar colocated servers hosting the content. I then collect finished over SSH file copy at my leisure.
In some scenarios you can increase overall transfer speeds by running multiple sessions simultaneously, like a multi-lane highway. This can help saturate your connection, which I was not getting.

I go into WinSCP and turn this on, 6 sounds good. Image
Read 21 tweets
Jan 5
In 2009, I got on a helicopter piloted by my friend. We lifted off with careless abandon, in the online mode of Grand Theft Auto 4, for the first time. We were normally talkative, but we both fell into wordlessness as we flew at night through this impossible city. And I realized.
Every story can be told here. Labor of untold people who toiled to Truman Show you made a city we flew by with only glance. On the streets, raced-by. There are innumerable conceits, things started and never finished. Left over from dreams aborted. But someone made this. For what?
A city never runs out of stories. A city is not reorganized for every allegorical plunder. The artists who strained for years to make this analogy have their effort thrown away on conclusion of an arc written by another or abandoned by player. But they made a city. For what?
Read 6 tweets
Dec 7, 2024
So my outsider impression is all cloud AI services have essentially nuked themselves in endless layers of safety and political conformance, while also desperately trying to save on compute. If you've watched o1 work it has layers of reasoning for "safety" before it answers.
And that cloud AI is essentially in a death spiral of mainstreaming concerns instead of delivering. Yes you've created a corpus of the sins of humanity and you're not remotely brave enough to just be a fucking adult about what your API returns.
The Google AI disaster is just the essential denial of how this technology works. It literally delivers the average signal. The proctologist is going to be an old white guy. That's the average. And you've taken it on yourself to deny this technology you built to say exactly that.
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(