After spending weeks optimizing an ETL process to be 3x faster, an index-only scan got 3x slower compared to a replica that hadn't undergone the new ETL process. Main clue: (shared) buffer hits were up by 10x.
Enter dynamic tracing with perf: wiki.postgresql.org/wiki/Profiling…
github.com/postgres/postg…
$ sudo perf probe -x '.../bin/postgres' 'ReadBufferExtended'
$ sudo perf record -R -g -p 123 -e 'probe_postgres:ReadBufferExtended'
<run query>
$ sudo perf report --stdio
How is this possible??
[1] github.com/postgres/postg…
[2] github.com/postgres/postg…
SELECT correlation FROM pg_stats WHERE tablename = 'my_table' AND attname = 'my_index_col'
The good replica had a correlation of 0.98 vs 0.22 for the slow replica (!)
CLUSTER my_table USING my_index;
VACUUM ANALYZE my_table;
And voila, correlation is close to 1 and my slow query is starting to fly again!
It seems that the optimizations that went into making the ETL process faster have caused the poor correlation. So I might need to run `CLUSTER` after every ETL, but this will likely eat up most of the ETL perf gains 🤕
I just sometimes wish that shaving the yaks didn't make them horny and procreate like this ...
postgresql.eu/about/membersh…
Not in the EU? Go here: postgresql.org/about/donate/