More PostgreSQL generic plan fun
On a recent project, I was checking whether data existed for networks in a large table. When running my program, I found that one of my queries was quick at first and then became slow. It turned out that Postgres had begun using a generic plan that performed much more poorly than the custom plan.
I was running queries like this:
SELECT EXISTS (
SELECT
FROM big_table
WHERE ip_address <<= $1
)
I was executing them using pgx as my Postgres driver which automatically prepares and caches statements.
The first few executions were quick, as were executions in psql
.
At first I didn't understand what was going on, especially when I saw the
same query executed quickly in psql
. However I realized that it could be
a generic plan issue given past experience. Examining the custom and
generic plans shows a huge difference:
The custom plan:
Result (cost=5.84..5.85 rows=1 width=1)
InitPlan 1 (returns $0)
-> Index Only Scan using big_table__ip_address_time_partial on big_table (cost=0.57..5.84 rows=1 width=0)
Index Cond: ((ip_address >= 'dead:beef::/62'::inet) AND (ip_address <= 'dead:beef:ffff:ffff:ffff:ffff'::inet))
Filter: ((ip_address)::inet <<= 'dead:beef::/62'::inet)
The generic plan:
Result (cost=5.66..5.67 rows=1 width=1)
InitPlan 1 (returns $0)
-> Index Only Scan using big_table__ip_address_time_partial on big_table (cost=0.57..8762195.67 rows=1721443 width=0)
Filter: ((ip_address)::inet <<= $1)
(I checked the plans using the same technique I describe in this post).
Postgres uses a heuristic whereby after five executions of a prepared statement it will begin using a generic plan if it looks less costly than the custom one. This is described here.
I wasn't yet sure that Postgres was actually switching to the custom plan though. To check, I told Postgres to never use a generic plan:
SET plan_cache_mode = force_custom_plan
This made the query consistently execute quickly, which means the generic plan was the problem.
One thing that is still confusing to me is why Postgres decided it was a good idea to use the generic plan when the cost is clearly far worse. I believe it is because the cost of the topmost node is cheaper in the generic plan, which is odd given it has a child node that is much more expensive. I had thought parent nodes include the cost of their child nodes, but clearly that is not the case here.
Update: I've realized I was confused in the above paragraph. The reason Postgres would use the generic plan is that it has a cheaper estimate (based on the top most node). While the plan in practice performs worse, the estimate is what Postgres uses.
Regarding the child node having a higher estimate than the parent: That is
explained by this (from the EXPLAIN
docs):
Also, if you limit the number of rows to return with a LIMIT clause, the planner makes an appropriate interpolation between the endpoint costs to estimate which plan is really the cheapest.
The estimate of the Index Only Scan
assumes we would fetch all of the
rows, but we do not because of the EXISTS
in the query.