The One and the Many

Tuning autovacuum in PostgreSQL

At work, we've been looking at tuning PostgreSQL's autovacuum settings. In this post I'm going to talk about the settings we're planning on using and how we're going to apply them.

Resources

There are several good resources about vacuum and autovacuum. A few I learned from are:

Where we're starting

Perhaps surprisingly, our autovacuum settings today are the defaults. My understanding is that this is not recommended as the defaults are conservative for larger, busier servers.

Using the defaults can probably be explained by us having run Postgres for many years and not having updated our setup to take advantage of improvements to autovacuum in newer versions.

One improvement in particular is relevant to us: We have some insert-mostly tables which were not handled well in older versions of Postgres. This changed in Postgres 13 when autovacuum began triggering vacuums based on the number of inserts in addition to based on the number dead tuples.

There have also been changes to autovacuum's defaults that made it more aggressive:

Prior to these changes, these insert-mostly tables would not get vacuumed at all until a vacuum to prevent wraparound was triggered. That would find a lot of work to do and it would take a long time (due to the slow, conservative settings). This ended up blocking deployment of DDL changes (and potentially could have resulted in worse issues).

To work around the lack of vacuum from inserts, we began running flexible-freeze each night. flexible-freeze runs VACUUM (FREEZE) against tables needing it most. This means in general we haven't needed autovacuum as flexible-freeze kept things in check. This is the likely the main reason we haven't needed to tune things, although I suspect we've also benefitted from the other changes I mentioned.

We don't have any particular issues with our setup, but there are benefits to tuning autovacuum:

(I was initially thinking that we'd also get statistics updated more often, but I think that would mean tuning autovacuum's analyze settings).

The plan

Based on my reading, my thinking is to adjust autovacuum such that it triggers more often and to allow it to work faster when it runs. Ideally it will trigger often but not have much to do.

The settings I'm thinking of are:

Reducing the scale factors means autovacuum triggers sooner and has fewer tuples to work on. Increasing the cost limit means it runs faster when triggered.

These are the system wide settings I'm thinking of using. I'm also planning on applying some settings on particular tables which I'll discuss below.

A digression: Autovacuum's speed

How do we calculate autovacuum's maximum speed? autovacuum has a cost budget of autovacuum_vacuum_cost_limit (default 200, new value 500) when it runs. After using up that budget it sleeps for autovacuum_vacuum_cost_delay (2ms). With these we can calculate its maximum bytes per second:

bytes per second = runs per second * bytes per run

where

runs per second = 1 second / autovacuum_vacuum_cost_delay
                = 1000 ms / 2 ms
                = 500

and

bytes per run = pages per run * page size
              = (autovacuum_vacuum_cost_limit / cost per page) * page size
              = (200 / cost per page) * 8192

The cost per page varies depending on what is done with a page:

We can calculate the maximum bytes per second for the different page operations. With the default autovacuum_vacuum_cost_limit = 200 we have:

Upping autovacuum_vacuum_cost_limit to 500 gives us:

A digression: Why autovacuum triggers

In order to understand how to tune autovacuum, we need to know the circumstances under which it triggers vacuums. There are three ways:

  1. The number of dead tuples in a table goes beyond a certain ratio (see autovacuum_vacuum_scale_factor above)
  2. The number of new tuples in a table goes beyond a certain ratio (see autovacuum_vacuum_insert_scale_factor above)
  3. The table has tuples that are old and need to be frozen to prevent wraparound

The first two are relatively straightforward to understand: Activity on a table leads to autovacuum having work to do on that table.

The third is interesting as it does not depend on activity on the table. As well, this kind of autovacuum is more aggressive in that won't stop if there is a query wanting a conflicting lock. It is considered more important to continue. We need to consider tuning for this case.

Per table settings

Depending on a table's workload, it can make sense to change settings on the table to better handle vacuums freezing tuples for wraparound protection.

Consider insert-only or insert-mostly tables:

  1. In many cases, there will be two passes of autovacuum for a tuple
    • After enough new tuples are inserted, autovacuum triggers. It will look at the new tuples and do things like update the visibility map. Later, autovacuum triggers again to freeze these tuples (though some may have been frozen by earlier, normal vacuums). This means processing these tuples twice. Typically for these kinds of workloads we don't need to do that since the tuples likely have not changed since the first vacuum. We could benefit from doing everything in one pass.
  2. Autovacuum for wraparound protection blocks DDL
    • The second autovacuum pass will block conflicting queries, which could happen at inopportune times (such as when you want to apply DDL during a deployment). This is relevant for all tables, but for these kinds of tables in particular it can be a problem as they both may be large and most of their tuples might not be frozen, potentially leading to longer vacuums of this type.

It would be nice if we could make it so that when autovacuum triggers for new tuples it would freeze them as well. That would make it such that we don't need two vacuum passes on the tuples. It would also avoid needing to do anything during wraparound protection vacuums.

Fortunately there are ways to do this. We can apply these settings to the tables via ALTER TABLE ... SET (...):

With these, autovacuum will be able to freeze the tuples immediately when it vacuums them due to inserts. No second pass will happen as the tuples are already frozen. (Running with the vacuum_freeze_min_age = 0 and vacuum_freeze_table_age = 0 is what VACUUM (FREEZE) does).

Setting these to 0 globally probably doesn't make sense: Most tables are probably not insert-only/-mostly. Freezing tuples immediately would be wasteful if they might get updated or deleted.

Applying these settings is key to us being able to stop using flexible-freeze as right now that is handling a lot of the freezing for us.

This is my thinking anyway. I am not yet at the stage of doing this. A thought is whether this would interact badly with already executing queries. Would these tuples become visible to them given we are saying they are visible to all transactions? I suspect Postgres protects us in some way, especially given VACUUM (FREEZE) does this, but I'm not sure how. It might be that locking protects us. If it's locking, then avoiding aggressive freezing this way entirely might not work (as I expect these vacuums would not always be able to freeze the tuples, leading to later aggressive vacuums regardless). Thanks to my colleague Marsel for pointing this part out!

Transitioning to the new settings

One thing that occurred to me while thinking about how to test and deploy these changes was what will happen when we switch the settings. In particular, when we change the scale factors, autovacuum will immediately find many tables need vacuuming all at once. This is because it was keeping tables vacuumed down to the old scale factor, but many could be well beyond the new scale factors (even with flexible-freeze, plus that doesn't run against all of our databases).

This made me concerned that we would see high I/O for an extended period of time, potentially to an extent that services would be impacted.

As well, the autovacuum workers could be stuck processing large tables for a long time and be unable to process other tables that would otherwise be vacuumed without issue. We could end up with bloat. (Part of this concern comes from us not planning to immediately restart the servers to apply the new workers setting).

My thinking is to reduce the scale factor settings slowly over time and monitor. For example, instead of going from 0.2 to 0.01 in one shot, we will first go to 0.17, let autovacuum get tables stable at that level, and then reduce again, e.g. to 0.14. Eventually we will reach 0.01 (or see that autovacuum is working too often and adjust our plan).

This will take longer to get the new settings enabled, but I think it will be a safer transition.

We also have to consider when to stop running flexible-freeze. I am thinking that we will continue running it for a period, likely weeks, until we see that autovacuum is freezing new tuples as expected in the large insert-mostly tables. It should never need to do much work on those tables, which we'll be able to confirm by looking at its logs. Turning it off will mean autovacuum will likely need to periodically run vacuums to prevent wraparound on tables where we haven't changed the freeze settings, but that should be okay (especially for smaller tables), and is already happening in several of our databases.

Monitoring autovacuum

Part of making any change like this is knowing how to monitor it. We can look at the state of autovacuum per table by looking at the values for each way autovacuum triggers. Before making the change, I wanted to see how well autovacuum is already keeping up. I also want to see how the state changes as we change the settings.

Monitoring dead tuples

For vacuums triggered by dead tuples, we can look at how many tuples we'd need to work on at varying scale factors:

SELECT c.relname, n_dead_tup, reltuples
FROM pg_stat_all_tables s
JOIN pg_class c ON oid = relid
WHERE n_dead_tup > 50 + reltuples * 0.01;

(The 50 is autovacuum_vacuum_threshold).

I happen to have this data in Prometheus too. A query like this shows it there:

pg_stat_user_tables_n_dead_tup{instance_host="[...]"} and (pg_stat_user_tables_n_dead_tup > 50 + pg_class_reltuples{} * 0.01)

If we use the current scaling factor, 0.2, we can see if autovacuum is currently keeping up, and by looking at the how number of tuples changes as we decrease it, we can see how much autovacuum will need to do when we change the value.

Monitoring inserted tuples

We can run similar queries for vacuums triggered by inserts:

SELECT c.relname, n_ins_since_vacuum, reltuples
FROM pg_stat_all_tables s
JOIN pg_class c ON oid = relid
WHERE n_ins_since_vacuum > 1000 + reltuples * 0.01;

(The 1000 is autovacuum_vacuum_insert_threshold).

And for Prometheus:

pg_stat_user_tables_n_ins_since_vacuum{instance_host="[...]"} and (pg_stat_user_tables_n_ins_since_vacuum > 1000 + pg_class_reltuples{} * 0.01)

Monitoring for transaction wraparound prevention

This one is trickier to monitor. We can look at how long ago the table was last vacuumed to prevent this. However I haven't found a way to look at how many tuples actually need freezing (other than I expect I could do so by inspecting individual pages).

This query will show a table might soon need to be vacuumed for this reason, but it won't show how much freezing work needs to be done. Note all of a table's tuples could already be frozen and effectively no work will be done, but the table will still show in this query.

SELECT c.relname, age(relfrozenxid), mxid_age(relminmxid)
FROM pg_stat_all_tables s
JOIN pg_class c ON oid = relid
ORDER BY 2 DESC, 3 DESC;

A digression: Vacuum taking an access exclusive lock

One potential gotcha related to tuning autovacuum to run more often is that vacuums (including those run by autovacuum) can take access exclusive locks that can cause issues if you are running hot standbys. My current thinking is that this behaviour should be disabled when using hot standbys.

The end

A further thing we could do is stress test autovacuum. How high could we set the cost limit before it begins to cause issues?

However, with the above plan, I feel good about making the change. Especially with our decision to reduce the scale factors slowly, I don't expect there to be issues.