Yesterday I had to work on a backfill script to update 24 Million records of payments. Yes, you read right, 24 million. The script ran for 3 hours and wasn´t even at half.
We found several ways to improve its performance today.
Summary below 👇
#SQL #Database
The first thing was to be able to check performance to make sure we were improving it. So, a sample size of 20k records updates instead of 24 million.
Original query took around 36 seconds. We ended up making this take only 6 seconds, and I think we can´t improve it more now.
1_ Removed an unnecessary join table. We were updating based on a list of descriptions values joined instead of using direct codes. Easier to read by humans, slower for computers. We are at 32 seconds now.
2_ After removal the filter was done with values on a temporal table pre-filled above but still on the where clause. We moved it to a join directly using the code and it improved a little more. 30 seconds. Not even close to good enough just yet.
3_ Now we are using codes directly instead of description. Then 💡, does this table have an index for this code field? No, it didn´t. So, we added the index. Boom, dramatic improve down to 10 seconds now!
Still not good enough. One more step.
4_ Check for triggers on this table. Those slow down a lot. There were 2 triggers. We can´t disable them on prod to all other consumers, so we used a hack with a temp table to avoid executing the trigger when said object is locked, and we lock it on script.
6 seconds now :)!
Of course, I cannot show the code itself. But I could write a short article going more into details about the index and triggers stuff. Let me know if you would be interested in reading about it :)!
I don´t like DB stuff much, but this kind of thing feels great 🔥!
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.
