PostgreSQL VACUUM taking an access exclusive lock
Recently at work we had an incident that turned out to be caused by a
VACUUM
command running against one of our databases. I thought this would
be an interesting subject for a post because I didn't know that VACUUM
s
could do this, and it is useful to think about how to deal with this
behaviour.
We regularly run VACUUM
s against our databases using
flexible-freeze. This calls
VACUUM FREEZE ANALYZE
against tables needing maintenance. During the
incident, this command took an ACCESS EXCLUSIVE
lock against a table we
use during API requests for approximately four minutes. This blocked many
requests and additionally caused us to exhaust available Postgres
connections because we had queries stuck for the duration.
The fact that a VACUUM
other than VACUUM FULL
could take an ACCESS
EXCLUSIVE
lock was a surprise to me. However, the Postgres
docs say this can
happen when the TRUNCATE
option is enabled (which it is by default). This
is so that the VACUUM
command can return disk space to the OS by
truncating the end of relations. This is done in the function
lazytruncateheap().
Needing to take a strong lock for this purpose makes sense, but issues can arise if a strong lock like this is held for longer than very short periods, which is what happened here.
A question is why VACUUM
would take such disruptive action and not abort
when queries were waiting on it. The problem mainly occurs when hot
standbys are involved. VACUUM
takes the lock on the primary database and
the lock is then replicated to standby servers as described in the
code.
The VACUUM
running on the primary is not aware of the queries on the
standbys waiting on the lock and continues happily along. This is a known
issue with VACUUM
and hot standbys, and lead to the addition of the above
TRUNCATE
option to disable the behaviour which was discussed in this
thread
resulting in this
commit.
Now that we know it can happen, we can figure out how to deal with it in the future.
There are a few ways we can handle it:
- We could tell
VACUUM
to not do this. This can be done by running it with theTRUNCATE
option disabled or by disablingvacuum_truncate
on a per table basis. However, this means we'd never return disk space to the OS, which could lead to different issues. Potentially disabling it could be combined with separateVACUUM
runs with truncating enabled that would be run with more care, such as with additional timeouts, or by periodically rewriting the table if needed. - We could design tables such that fewer rows need to be deleted. The
table in question is one we delete from regularly. To achieve this, one
option is to drop the table rather than delete from it, such as via
expiring old rows using partitioning. Not deleting as many rows means
there is a reduced likelihood
VACUUM
would need to take the action involving the strong lock, though the risk would still be present. - We could spread out
DELETE
s over time. The table in question had a large delete happen all at one time, leaving many dead rows. If theDELETE
s instead happened over a period of time, freeing disk space to the OS may not have been needed. - We could run the
VACUUM
s withstatement_timeout
set to a short duration. This means that we'd still allow the maintenance to take place, but we'd abort the queries if they take too long. The problem is that it is usually fine and desirable to letVACUUM
s run for some time. It is only an issue if they take a strong lock. It would be ideal if we could specify a timeout only in the case where a strong lock is involved, but that's not currently possible. As is, this is likely not a good option. - We could monitor for and abort any
VACUUM
queries that take a strong lock. This would letVACUUM
s continue to run over time but stop them "immediately" when they take a strong lock. The idea is that in general these operations are fine to run, and even if they take a strong lock, they need to complete their task holding the strong lock quickly to be acceptable. My thinking is to run something like this every few seconds. It remains to be seen whether this is actually a good idea though. My assumption is that theVACUUM
s would still be able to return disk space to the OS, even if it may take a few tries. If they are never able to do that, this is a hacky version of option one. - We could use a
statement_timeout
on all online queries running in our application. This way even if a table gets locked like this, the system won't effectively go offline. - Enabling the
old_snapshot_threshold
setting has the side effect of disabling this behaviour ("freed space at the end of a relation cannot be released to the operating system"). This is effectively the same as option one, though may be desirable for other reasons.
Option two, three, and six are good practices regardless, so these should be considered as part of the solution.
Initially while writing this post I believed option five was also necessary
and the way to go. The other options do not guarantee VACUUM
won't lock
tables, merely reduce the chance (two and three) or make the application
behave better if it happens (six). This means that there is still a risk
that a VACUUM
would lock tables and impact the application.
However, I now think option one or seven (disabling the truncation
behaviour) is a surer and less hacky approach. This does mean that tables
may grow in size, but that can be dealt with by rewriting the table if
necessary, such as with pg_repack.
What swayed me was the fact that the TRUNCATE
option exists in the first
place, as well as old_snapshot_threshold
disabling the behaviour anyway.
Note: Autovacuum can also truncate relations under the same circumstances.
Unfortunately this means autovacuum can be dangerous for hot standbys. At
the time of writing, the only way to stop autovacuum from doing this is by
changing vacuum_truncate
storage parameters on tables. I wrote an
email
to the pgsql-general mailing list asking about this.