Removing PostgreSQL's old_snapshot_threshold
At work we have been using the
old_snapshot_threshold
setting in PostgreSQL. We discovered that it will be removed in PostgreSQL
17.
After looking into why, we decided we should stop using it immediately.
I'll talk about what it is and why we were using it, why we decided to
disable it now, and what we're doing instead.
What is it and why were we using it
The purpose of this setting is to prevent bloat. We enabled it after an incident where bloat affected database performance.
If there is a transaction holding the database horizon (by relying on an older snapshot), vacuums are not allowed to clean up old rows that the snapshot can see. This means if a transaction continues for a long time it can cause bloat and performance issues. (This depends on your workload).
Protecting against this is especially important if you have standbys with
hot_standby_feedback
enabled (as we do) as it means that long running transactions on standby
servers can cause bloat.
old_snapshot_threshold
caps how old a snapshot can be before rows can be
vacuumed. With it enabled, Postgres can vacuum rows older than the
setting's threshold even if there is a transaction running with a snapshot
saying they should stay around. Normally this is not allowed. This
effectively puts a cap on bloat.
A transaction running beyond the configured threshold will error if it tries to access vacuumed rows but otherwise can continue. This means a secondary benefit of the setting is enabling longer running transactions that would otherwise be unsafe.
We relied on this setting for protection against bloat, though we do have transactions benefitting from the ability to run for a longer time.
A side benefit of this setting is that it addresses an issue I previously
described. Enabling
it makes that problem go away because it disables VACUUM
's ability to
truncate empty pages (see the TRUNCATE
option to
VACUUM
). While we
had other measures in place to help with this, old_snapshot_threshold
makes the problem impossible.
Why we disabled it now
When we saw it was being removed, we looked into why. There is a thread on the Postgres mailing list discussing issues with it. This message summarizes many of them. Apparently it is dangerously broken and can cause invalid query results.
As a result, we decided we should stop using it. Note we never noticed any issues, but we likely would not have as queries could silently give corrupt results.
What we're doing instead
We still want protection from bloat. How can we achieve that?
In addition to old_snapshot_threshold
we have a cronjob that cancels
transactions that run beyond a certain time (generally 1 hour). We only run
this against primaries currently, so it would leave us vulnerable to
standbys holding the horizon due to hot_standby_feedback
. One option
would be to start running this against standbys.
A consideration is we have a small number of transactions on standbys that take longer than 1 hour. They are able to succeed because they do not need to access old rows. If we enforce a 1 hour limit on standbys these transactions will not be able to complete.
Despite that we decided a 1 hour threshold would be best for our workload. We'll be modifying the longer running transactions so that they can complete in a shorter time.
We're not going to use the cronjob for the limit though.
To enforce this limit we're going to use transaction_timeout
when it's
available (in Postgres 17). Until then we're setting statement_timeout
and idle_in_transaction_session_timeout
to 1 hour and 10 minutes
respectively. In theory a transaction could remain open longer than 1 hour
(for example if a transaction runs two statements that each take 50
minutes), so we will have monitoring for that. However it is not a pattern
we typically use, so we don't expect it to occur. transaction_timeout
will fully protect us. We won't need statement_timeout
then.
These protect us because they cap how long a snapshot can be around. We
lose the ability to run transactions longer than an hour but we think this
is tolerable. If we need longer transactions, then a standby without
hot_standby_feedback
is an option.
transaction_timeout
is actually more aggressive than required. At the
Read Committed isolation level it would be fine for virtual transactions to
run beyond 1 hour as long as each individual statement is less than 1 hour.
However it is usually the case that it is statement duration that is a
problem for us rather than transaction duration.
We haven't forgotten about the VACUUM
empty page truncation side benefit
either. To deal with that problem we've switched our VACUUM
s to disable
the TRUNCATE
option.
The end
It's unfortunate to see old_snapshot_threshold
go away as it seems like
it enables useful workloads, but it sounds like there's good reasons for
removing it. I think what we've come up with will enable our system to
continue operating well without it.