12,399 views

Felix Geisendörfer

@felixge

, 12 tweets, 3 min read

My Authors

1) Oh man, computer stuff is hard. A small #postgresql thread:

After spending weeks optimizing an ETL process to be 3x faster, an index-only scan got 3x slower compared to a replica that hadn't undergone the new ETL process. Main clue: (shared) buffer hits were up by 10x.

2) My initial suspicion was vacuum issues or index corruption, but no amount of `VACUUM FULL` or recreating the indexes was helping. So I had to dig deeper.

Enter dynamic tracing with perf: wiki.postgresql.org/wiki/Profiling…

3) After some studying of the PostgreSQL source, I decided that ReadBufferExtended would be a good function to trace in order to figure out the big increase in buffer hits/allocs.

github.com/postgres/postg…

4) So I ran `SELECT pg_backend_pid()` and instrument backend 123 like this:

$ sudo perf probe -x '.../bin/postgres' 'ReadBufferExtended'
$ sudo perf record -R -g -p 123 -e 'probe_postgres:ReadBufferExtended'
<run query>
$ sudo perf report --stdio

5) What this told me was that only 5% of my allocs came from `btgettuple` (reading btree tuples), and 95% came from `visibilitymap_get_status` (checking the visibility map).

How is this possible??

6) Turns out that the index-only-scan node in references a visibility map buffer (`ioss_VMBuffer`) as it loops over the index entries [1]. But if the next tuple is not on the same VM page, vm_readbuf gets called [2].

[1] github.com/postgres/postg…
[2] github.com/postgres/postg…

7) This is when it hit me. For some reason my index entries must be very poorly correlated to my heap/VM pages!

SELECT correlation FROM pg_stats WHERE tablename = 'my_table' AND attname = 'my_index_col'

The good replica had a correlation of 0.98 vs 0.22 for the slow replica (!)

8) How can we fix the correlation? CLUSTER to the rescue:

CLUSTER my_table USING my_index;
VACUUM ANALYZE my_table;

And voila, correlation is close to 1 and my slow query is starting to fly again!

9) So yeah, computer stuff is hard.

It seems that the optimizations that went into making the ETL process faster have caused the poor correlation. So I might need to run `CLUSTER` after every ETL, but this will likely eat up most of the ETL perf gains 🤕

10) Anyway, I don't blame PostgreSQL. Making database stuff run efficiently is an inherently hard an complex problem.

I just sometimes wish that shaving the yaks didn't make them horny and procreate like this ...

Can't await the day when twitter offers an edit function so I can fix all my typos 🙈

I don't have a soundcloud, but for just 10 EUR for 2 years you can join me and become a member of PostgreSQL Europe.

postgresql.eu/about/membersh…

Not in the EU? Go here: postgresql.org/about/donate/

Enjoying this thread?

Keep Current with Felix Geisendörfer

Stay in touch and get notified when new unrolls are available from this author!

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Enjoying this thread?

Try unrolling a thread yourself!

Related hashtags

Related threads

Trending hashtags

Did Thread Reader help you today?