The One and the Many

Removing PostgreSQL's old_snapshot_threshold

At work we have been using the old_snapshot_threshold setting in PostgreSQL. We discovered that it will be removed in PostgreSQL 17. After looking into why, we decided we should stop using it immediately. I'll talk about what it is and why we were using it, why we decided to disable it now, and what we're doing instead.

What is it and why were we using it

The purpose of this setting is to prevent bloat. We enabled it after an incident where bloat affected database performance.

If there is a transaction holding the database horizon (by relying on an older snapshot), vacuums are not allowed to clean up old rows that the snapshot can see. This means if a transaction continues for a long time it can cause bloat and performance issues. (This depends on your workload).

Protecting against this is especially important if you have standbys with hot_standby_feedback enabled (as we do) as it means that long running transactions on standby servers can cause bloat.

old_snapshot_threshold caps how old a snapshot can be before rows can be vacuumed. With it enabled, Postgres can vacuum rows older than the setting's threshold even if there is a transaction running with a snapshot saying they should stay around. Normally this is not allowed. This effectively puts a cap on bloat.

A transaction running beyond the configured threshold will error if it tries to access vacuumed rows but otherwise can continue. This means a secondary benefit of the setting is enabling longer running transactions that would otherwise be unsafe.

We relied on this setting for protection against bloat, though we do have transactions benefitting from the ability to run for a longer time.

A side benefit of this setting is that it addresses an issue I previously described. Enabling it makes that problem go away because it disables VACUUM's ability to truncate empty pages (see the TRUNCATE option to VACUUM). While we had other measures in place to help with this, old_snapshot_threshold makes the problem impossible.

Why we disabled it now

When we saw it was being removed, we looked into why. There is a thread on the Postgres mailing list discussing issues with it. This message summarizes many of them. Apparently it is dangerously broken and can cause invalid query results.

As a result, we decided we should stop using it. Note we never noticed any issues, but we likely would not have as queries could silently give corrupt results.

What we're doing instead

We still want protection from bloat. How can we achieve that?

In addition to old_snapshot_threshold we have a cronjob that cancels transactions that run beyond a certain time (generally 1 hour). We only run this against primaries currently, so it would leave us vulnerable to standbys holding the horizon due to hot_standby_feedback. One option would be to start running this against standbys.

A consideration is we have a small number of transactions on standbys that take longer than 1 hour. They are able to succeed because they do not need to access old rows. If we enforce a 1 hour limit on standbys these transactions will not be able to complete.

Despite that we decided a 1 hour threshold would be best for our workload. We'll be modifying the longer running transactions so that they can complete in a shorter time.

We're not going to use the cronjob for the limit though.

To enforce this limit we're going to use transaction_timeout when it's available (in Postgres 17). Until then we're setting statement_timeout and idle_in_transaction_session_timeout to 1 hour and 10 minutes respectively. In theory a transaction could remain open longer than 1 hour (for example if a transaction runs two statements that each take 50 minutes), so we will have monitoring for that. However it is not a pattern we typically use, so we don't expect it to occur. transaction_timeout will fully protect us. We won't need statement_timeout then.

These protect us because they cap how long a snapshot can be around. We lose the ability to run transactions longer than an hour but we think this is tolerable. If we need longer transactions, then a standby without hot_standby_feedback is an option.

transaction_timeout is actually more aggressive than required. At the Read Committed isolation level it would be fine for virtual transactions to run beyond 1 hour as long as each individual statement is less than 1 hour. However it is usually the case that it is statement duration that is a problem for us rather than transaction duration.

We haven't forgotten about the VACUUM empty page truncation side benefit either. To deal with that problem we've switched our VACUUMs to disable the TRUNCATE option.

The end

It's unfortunate to see old_snapshot_threshold go away as it seems like it enables useful workloads, but it sounds like there's good reasons for removing it. I think what we've come up with will enable our system to continue operating well without it.

Comments