Profile picture
Nick Craver @Nick_Craver
, 25 tweets, 11 min read Read on Twitter
Well hello there memory leak...let's see what you are.
It's times when I type !dumpheap without an argument that call for a Snickers.
Alrighty, let's see what these little guys are:
Quite a bit of repetition in here - let's root some of these char[] and see:
Anyone else noticing a pattern?

But this may not be our issue. Note the size (136,688), this is greater than the 85K large object heap threshold and they may be long-awaiting a GC pass, not an indefinite leak situation.
Alrighty I've got a comparison dump. While dump #1 has some things to track down, that's not the goal. The delta and *what's growing* is what we're after. In this case, it's pretty apparent: EF Core bits somehow.

Let's get to the root of those, shall we?
Had to break for kiddo bedtime, sorry! Okay if we track these back, we see then rooted to the context on our hourly scheduled route (a hefty set of operations).
Now we can combine what we know here to narrow it down quickly without digging further in the dump.
A) We have lots of user objects
B) We have lots of change tracking
C) We have a memory leak specifically on ny-web10 and 11 servers (where meta.stackoverflow.com runs):
I know (just because) we sync some things with meta sites. I know Stack Overflow is our largest site. So in hourly, we find...oh hey, things that only run on metas! I converted reputation sync to Dapper days ago, so let's look at profile syncing:
So this is getting *all* users from the database, which was fine in Linq2Sql for some reason, but EF Core is choking on it. What does that api/users/syncable route look like? Pretty simple:
If we zoom out, that .ToDictionary() (I'm not sure how this wasn't a problem before today...) is loading 1,148,071 User objects on meta.stackoverflow.com. When we only need a few thousand-ish (max) that changed in the last 6 hours from on Stack Overflow.
A naive fix would be "just load the users as you need them", which would look like this.

And it'd probably work, but it means about a query for every hit and miss (still eats a SQL roundtrip) and with thousands of queries every run, it's not awesome.
In the .CopyPropertiesTo() method, we're dealing with UserMetadata, so that's a lazy load (bad!). It looks like this in MiniProfiler:
So let's form our dictionary in batches EF Core will load, like this.

We can lower our SQL roundtrips to n / 1000 + relevant changes. It's more SQL trips than our original 1, but we aren't needlessly loading a million users into memory. We load about 0.3% of that instead.
If you're curious why I'm not using Span<T> here, it's because you can't do so in an async method. Even though it's safe here (it's 1 execution block), it's not yet allowed. It would be far cleaner, but c'est la vie.
Here's our before and after *with users updated* (more on that shortly). Okay, that's better and we're not using 1000x more memory at scale.
Here is syncing no users, note the run time, it's *insanely* lower: about 400 milliseconds vs. out 120+ second runs with just a few hundred users. I'm assuming EF Core's change tracking is incredibly expensive, but let's find out.
Let's tweak the code and disable the automatic .DetectChanges() calls, doing it ourselves, and look at the timings. So yes, change tracking isn't just expensive here, it's *insanely* expensive. Making our route take 150x longer.
Alrighty, so anyway we found what the cause was and more we could optimize there to make it way more efficient. That's a typical day but I thought I'd share a dig, hope you enjoyed!

I'll leave you with the internal PR created:
I've deployed the fix to our meta environment to sit for the weekend. Before doing so, I pulled out one more tool: our stack trace dumper. This confirms our problem: 14 concurrent hourly jobs inside SyncProfilesWithParentSiteAsync, all deep in here:
A few items here:
1) Always measure or check *before deploying the fix*, otherwise what are you checking to see if it's fixed?
2) We've got the expensive stack where we're spending lots of time in. That's our starting point for EF Core change tracking optimizations next.
This is our before picture of the last 3 days with the fix just now deployed. Remind me tomorrow and we'll see what happened post-fix.

Note: see the spikes? About 24 of them in a day? That's our hourly job going off the rails!
To wrap this up, here's the satisfying after graphs of our memory leak fix.

Metrics are awesome. Graphs are awesome. You can stare at and crunch numbers all day to see if something made a difference, but a readily accessible graph of a metric will save lots of time, every time.
Now we were focused on memory (our main symptom), but remember that call stack? That was causing load.

Take a look at the CPU impact the same fix had. Here's 3 and 12 day views of CPU on the same web tier. We are now back at pre-EF Core utilization levels. Awesome.
Also note the hourly spikes while most memory-impacting on ny-web10 and 11 (due to meta.stackoverflow.com), were CPU-impacting everywhere the scheduler hit (due to hundreds of sites added up).

That's all for this round - I'm very pleased with the result and I hope you enjoyed.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Nick Craver
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!