PostgreSQL types surprise
I recently encountered a query that performed much slower than I expected. The cause turned out to be that a parameter value was treated as a type matching an index in one spot but not matching a different index in another spot. This was not obvious at first so I thought it would be interesting to discuss it.
I expected the query to take under 1ms to execute, but in some cases it took over 600ms. When first looking at it, I thought that the cause was that the query was using a generic plan rather than a custom plan, and the planner was making a poor choice as a result. This is because the plan looked good when I ran the query normally but looked poor when I used a generic plan.
The query looked like this:
EXPLAIN WITH a AS ( SELECT MAX(total) AS total FROM table_a WHERE account_id = 123 AND sub_account_id = '456' AND event_date > '2023-10-10 20:00:36.364844+00'::timestamptz - '2 days' AND event_date < '2023-10-10 20:00:36.364844+00'::timestamptz ), b AS ( SELECT total FROM table_b WHERE account_id = 123 AND sub_account_id = '456' ) SELECT (SELECT total FROM a), (SELECT total FROM b)
And the plan:
Result (cost=7.06..7.07 rows=1 width=64) InitPlan 1 (returns $0) -> Aggregate (cost=3.59..3.60 rows=1 width=32) -> Index Scan using table_a_pkey on table_a (cost=0.56..3.59 rows=1 width=6) Index Cond: (((account_id)::integer = 123) AND (sub_account_id = '456'::text) AND (event_date > ('2023-10-10 20:00:36.364844+00'::timestamp with time zone - '2 days'::interval)) AND (event_date < '2023-10-10 20:00:36.364844+00'::timestamp with time zone)) InitPlan 2 (returns $1) -> Index Scan using table_b_pkey on table_b (cost=0.43..3.45 rows=1 width=5) Index Cond: (((account_id)::integer = 123) AND (sub_account_id = '456'::citext))
This plan looks good and fully uses indexes on both tables.
When I looked at the generic plan, the indexes weren't fully used (since
this is Postgres 15, I had to prepare the statement to see the generic plan
rather than use Postgres 16's
PREPARE stmt AS WITH a AS ( SELECT MAX(total) AS total FROM table_a WHERE account_id = $1 AND sub_account_id = $2 AND event_date > $3::timestamptz - '2 days' AND event_date < $3::timestamptz ), b AS ( SELECT total FROM table_b WHERE account_id = $1 AND sub_account_id = $2 ) SELECT (SELECT total FROM a), (SELECT total FROM b) SET plan_cache_mode = force_generic_plan; EXPLAIN execute stmt(123, '456', '2023-10-10 20:00:36.364844+00');
And the plan:
Result (cost=46807.63..46807.64 rows=1 width=64) InitPlan 1 (returns $0) -> Aggregate (cost=3.59..3.60 rows=1 width=32) -> Index Scan using table_a_pkey on table_a (cost=0.56..3.58 rows=1 width=6) Index Cond: (((account_id)::integer = $1) AND (sub_account_id = $2) AND (event_date > ($3 - '2 days'::interval)) AND (event_date < $3) AND (event_date = $3)) InitPlan 2 (returns $1) -> Bitmap Heap Scan on table_b (cost=1111.34..46804.02 rows=1 width=5) Recheck Cond: ((account_id)::integer = $1) Filter: ((sub_account_id)::text = $2) -> Bitmap Index Scan on table_b_pkey (cost=0.00..1111.34 rows=80121 width=0) Index Cond: ((account_id)::integer = $1)
This plan's estimate is much worse, and it only uses the
component of the multi-column
(account_id, sub_account_id) index on
table_b (but still fully uses the
event_date) index on
This was the plan that the query was using, and explains why the query was taking much longer in some cases: The index was not being fully used and so we had to filter many tuples from the heap.
Seeing this lead me to believe the generic plan was the problem. However, I realized that Postgres does not use the generic plan if the estimate is much worse than the custom plan (as described here), so this being the cause didn't make sense.
I realized that the types could be the issue. Sure enough, the column type
sub_account_id was different on the two tables (and as a result also
different in the two indexes). It was
table_b. Postgres inferred from context that
$2's type was
was fine for
table_a, but the second use of
table_b made this
choice a problem since the
(account_id, sub_account_id) index could not
be fully used due to
sub_account_id being a different type.
To fix this, we can change the second instance of
(Another solution would be to pass
sub_account_id as a separate
parameter, e.g. by adding a
$4 to the query. This would let Postgres pick
citext from context. I'm not sure if there's a reason to prefer one or
In hindsight the cause is obvious, but it took me a while to realize what was going on.
One thing I need to keep in mind when looking at
EXPLAIN output in the
future is that how you provide the parameter values can affect the plan, as
shown here. If I had looked at the parameterized version of the query at
first rather than inlining parameters in the query text and running it with
psql, it would have been clearer what was going on.
To do that, one way is to run the
EXPLAIN query in a similar way as the
application does. (You can still retrieve the
EXPLAIN output that way).
For example, in my case I could call the
EXPLAIN query via Go code. This
also would show any surprises stemming from how the Postgres driver
provides parameters, which can likely impact behaviour too.