PostgreSQL VACUUM taking an access exclusive lock
Recently at work we had an incident that turned out to be caused by a
VACUUM command running against one of our databases. I thought this would
be an interesting subject for a post because I didn't know that VACUUMs
could do this, and it is useful to think about how to deal with this
behaviour.
We regularly run VACUUMs against our databases using
flexible-freeze. This calls
VACUUM FREEZE ANALYZE against tables needing maintenance. During the
incident, this command took an ACCESS EXCLUSIVE lock against a table we
use during API requests for approximately four minutes. This blocked many
requests and additionally caused us to exhaust available Postgres
connections because we had queries stuck for the duration.
The fact that a VACUUM other than VACUUM FULL could take an ACCESS
EXCLUSIVE lock was a surprise to me. However, the Postgres
docs say this can
happen when the TRUNCATE option is enabled (which it is by default). This
is so that the VACUUM command can return disk space to the OS by
truncating the end of relations. This is done in the function
lazytruncateheap().
Needing to take a strong lock for this purpose makes sense, but issues can arise if a strong lock like this is held for longer than very short periods, which is what happened here.
A question is why VACUUM would take such disruptive action and not abort
when queries were waiting on it. The problem mainly occurs when hot
standbys are involved. VACUUM takes the lock on the primary database and
the lock is then replicated to standby servers as described in the
code.
The VACUUM running on the primary is not aware of the queries on the
standbys waiting on the lock and continues happily along. This is a known
issue with VACUUM and hot standbys, and lead to the addition of the above
TRUNCATE option to disable the behaviour which was discussed in this
thread
resulting in this
commit.
Now that we know it can happen, we can figure out how to deal with it in the future.
There are a few ways we can handle it:
- We could tell
VACUUMto not do this. This can be done by running it with theTRUNCATEoption disabled or by disablingvacuum_truncateon a per table basis. However, this means we'd never return disk space to the OS, which could lead to different issues. Potentially disabling it could be combined with separateVACUUMruns with truncating enabled that would be run with more care, such as with additional timeouts, or by periodically rewriting the table if needed. - We could design tables such that fewer rows need to be deleted. The
table in question is one we delete from regularly. To achieve this, one
option is to drop the table rather than delete from it, such as via
expiring old rows using partitioning. Not deleting as many rows means
there is a reduced likelihood
VACUUMwould need to take the action involving the strong lock, though the risk would still be present. - We could spread out
DELETEs over time. The table in question had a large delete happen all at one time, leaving many dead rows. If theDELETEs instead happened over a period of time, freeing disk space to the OS may not have been needed. - We could run the
VACUUMs withstatement_timeoutset to a short duration. This means that we'd still allow the maintenance to take place, but we'd abort the queries if they take too long. The problem is that it is usually fine and desirable to letVACUUMs run for some time. It is only an issue if they take a strong lock. It would be ideal if we could specify a timeout only in the case where a strong lock is involved, but that's not currently possible. As is, this is likely not a good option. - We could monitor for and abort any
VACUUMqueries that take a strong lock. This would letVACUUMs continue to run over time but stop them "immediately" when they take a strong lock. The idea is that in general these operations are fine to run, and even if they take a strong lock, they need to complete their task holding the strong lock quickly to be acceptable. My thinking is to run something like this every few seconds. It remains to be seen whether this is actually a good idea though. My assumption is that theVACUUMs would still be able to return disk space to the OS, even if it may take a few tries. If they are never able to do that, this is a hacky version of option one. - We could use a
statement_timeouton all online queries running in our application. This way even if a table gets locked like this, the system won't effectively go offline. - Enabling the
old_snapshot_thresholdsetting has the side effect of disabling this behaviour ("freed space at the end of a relation cannot be released to the operating system"). This is effectively the same as option one, though may be desirable for other reasons.
Option two, three, and six are good practices regardless, so these should be considered as part of the solution.
Initially while writing this post I believed option five was also necessary
and the way to go. The other options do not guarantee VACUUM won't lock
tables, merely reduce the chance (two and three) or make the application
behave better if it happens (six). This means that there is still a risk
that a VACUUM would lock tables and impact the application.
However, I now think option one or seven (disabling the truncation
behaviour) is a surer and less hacky approach. This does mean that tables
may grow in size, but that can be dealt with by rewriting the table if
necessary, such as with pg_repack.
What swayed me was the fact that the TRUNCATE option exists in the first
place, as well as old_snapshot_threshold disabling the behaviour anyway.
Note: Autovacuum can also truncate relations under the same circumstances.
Unfortunately this means autovacuum can be dangerous for hot standbys. At
the time of writing, the only way to stop autovacuum from doing this is by
changing vacuum_truncate storage parameters on tables. I wrote an
email
to the pgsql-general mailing list asking about this.