, 12 tweets, 3 min read
My Authors
Read all threads
1) Oh man, computer stuff is hard. A small #postgresql thread:

After spending weeks optimizing an ETL process to be 3x faster, an index-only scan got 3x slower compared to a replica that hadn't undergone the new ETL process. Main clue: (shared) buffer hits were up by 10x.
2) My initial suspicion was vacuum issues or index corruption, but no amount of `VACUUM FULL` or recreating the indexes was helping. So I had to dig deeper.

Enter dynamic tracing with perf: wiki.postgresql.org/wiki/Profiling…
3) After some studying of the PostgreSQL source, I decided that ReadBufferExtended would be a good function to trace in order to figure out the big increase in buffer hits/allocs.

github.com/postgres/postg…
4) So I ran `SELECT pg_backend_pid()` and instrument backend 123 like this:

$ sudo perf probe -x '.../bin/postgres' 'ReadBufferExtended'
$ sudo perf record -R -g -p 123 -e 'probe_postgres:ReadBufferExtended'
<run query>
$ sudo perf report --stdio
5) What this told me was that only 5% of my allocs came from `btgettuple` (reading btree tuples), and 95% came from `visibilitymap_get_status` (checking the visibility map).

How is this possible??
6) Turns out that the index-only-scan node in references a visibility map buffer (`ioss_VMBuffer`) as it loops over the index entries [1]. But if the next tuple is not on the same VM page, vm_readbuf gets called [2].

[1] github.com/postgres/postg…
[2] github.com/postgres/postg…
7) This is when it hit me. For some reason my index entries must be very poorly correlated to my heap/VM pages!

SELECT correlation FROM pg_stats WHERE tablename = 'my_table' AND attname = 'my_index_col'

The good replica had a correlation of 0.98 vs 0.22 for the slow replica (!)
8) How can we fix the correlation? CLUSTER to the rescue:

CLUSTER my_table USING my_index;
VACUUM ANALYZE my_table;

And voila, correlation is close to 1 and my slow query is starting to fly again!
9) So yeah, computer stuff is hard.

It seems that the optimizations that went into making the ETL process faster have caused the poor correlation. So I might need to run `CLUSTER` after every ETL, but this will likely eat up most of the ETL perf gains 🤕
10) Anyway, I don't blame PostgreSQL. Making database stuff run efficiently is an inherently hard an complex problem.

I just sometimes wish that shaving the yaks didn't make them horny and procreate like this ...
Can't await the day when twitter offers an edit function so I can fix all my typos 🙈
I don't have a soundcloud, but for just 10 EUR for 2 years you can join me and become a member of PostgreSQL Europe.

postgresql.eu/about/membersh…

Not in the EU? Go here: postgresql.org/about/donate/
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Felix Geisendörfer

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!