Ramon Fritsch Profile picture
Building @letsavee, an inspiration platform with 1.3M+ creatives.

Jan 14, 2024, 14 tweets

Let me tell you how I took this CPU usage of an 8 core server down from 800% to 140%.

A while ago I got an email from @linode saying the server was peaking at 770% for a few hours. It was just a heads up since dedicated servers are designed to sustain such load for a long time without any issues.

First I suspected it was a DDOS attack. By seeing the network traffic chart, I noticed most of the traffic was happening between servers. (@letsavee has 2 main servers, www and database).

Ran `nethogs` on the www machine to inspect this substantial traffic. All being sent/received by the www NodeJS processes.

Ran `nethogs` on the database machine as well. All being sent/received by the MongoDB process.

Ran `mongostat` to take a look at query load on mongo. Ouch, ~15000 queries in the queue. I don't remember ever seeing that much before.

Decided to take a closer look at queries. By running `db.setProfilingLevel(2, 0);` on Mongo, it allows it to log EVERY SINGLE QUERY into a file for later inspection. I let it run for a few minutes and got back to the previous setting.

Now is time to download that log file and write a custom script to count how many queries each table had during that period. Found out it was mostly the teamusers table.

Then I inspected the code to look for bad smells. We're low key rolling out the Teams plan to test for adoption. This made our check to see if a user has valid Pro privileges more complex as it now has to do an aggregation to see if the user is a member of a valid paying team.

Found the query and introduced a cache layer for it, invalidating every time something related to this query changed (user is added to a team, team admin's subscription expires…). This is super easy to get out of sync so I use this technique only in a few strategic places.

Here's what those scary CPU/Network charts look like now. Huge drop!
CPU: 800% to 130%
Private Network In: 52Mbps to 3Mbps

Of course this also had an impact on the database server. YEAH!

Best thing was a power user sending a DM that he felt the difference right away. He mentioned his page load dropped from ~30s to ~6s and the API got much faster as well.

It makes all the effort worth it!

@threadreaderapp unroll

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling