LIMIT considered harmful in #PostgreSQL, a brief thread. 1/n
LIMIT can cause very strange planning issues in PostgreSQL, and you shouldn't use it unless it's definitely required. 2/n
(Of course, LIMIT *is* sometimes required, so use it if you need to!) 3/n
For example, a seemingly benign query that runs great without LIMIT, like SELECT * FROM t WHERE a = 1 (a not being unique), can fall apart completely with LIMIT. 4/n
This is true even (in fact, especially) if the LIMIT is small and is close to the number of rows that would be returned without LIMIT. 5/n
The short form is that when you specify a LIMIT on a query like that, PostgreSQL does a calculation of the "density" of that value in the table. On (say) a 300,000 row table with 1,500 hits, it assumes that it needs to scan 200 records to get a match. 6/n
200 records isn't a lot! It's well within the threshold that PostgreSQL will likely walk an index (if possible) or do a sequential scan, if not. And if it's right, it'll be a good plan. 7/n
But it's often not right, either because of bad statistics, or the assumption about density is wrong. Even if the statistics are as good as can be gotten, sometimes PostgreSQL can't estimate very exactly (like, say, array fields). 8/n
So, suddenly, it's walking an entire index or doing a full sequential scan when another plan (bitmap index/heap scan, for example) would be much faster. 9/n
It's often better to just don't put LIMIT on and just pick what you need in the application, if the number of rows that could possibly come back is manageable. 10/n
Sadly, some ORMs always slap a LIMIT on queries becuase... um... they do. (Like #django and its LIMIT 21 thing). If you can turn that off, do that. 11/n
(And remember that without an ORDER BY clause, LIMIT will return a set of rows that is not random enough to be statistically useful, but not deterministic enough to be counted on.) 12/n
So, don't always put a LIMIT 1 on something thinking it will be faster! It might not be. And *do* use LIMIT if you need to. 13/end
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Continuing our discussion of #PostgreSQL time types, what about the much-maligned TIMESTAMP type, without TIME ZONE? 1/n
TIMESTAMP and TIMESTAMP WITH TIME ZONE have exactly the same binary representation. TIMESTAMP WITH TIME ZONE means "this timestamp is known to be in UTC." 2/n
TIMESTAMP-no-TZ means "no idea what time zone this particular date and time are in, hope you do." 3/n
Let's talk a bit more about setting work_mem in #PostgreSQL. 1/n
The basic rule is "as high as required, and no higher." But how high is required? 2/n
A lot of tools and websites will give you formulae, usually based on some combination of system memory and number of connection. These aren't *wrong* exactly, but... 3/n