Tuning autovacuum in PostgreSQL
At work, we've been looking at tuning PostgreSQL's autovacuum settings. In this post I'm going to talk about the settings we're planning on using and how we're going to apply them.
Resources
There are several good resources about vacuum and autovacuum. A few I learned from are:
- https://www.postgresql.org/docs/current/routine-vacuuming.html (amazing docs!)
- https://postgres.fm/episodes/vacuum (where I learned it was a good idea to tune it in the first place)
- https://www.cybertec-postgresql.com/en/tuning-autovacuum-postgresql/ (great guide)
- https://web.archive.org/web/20210417030906/https://www.2ndquadrant.com/en/blog/autovacuum-tuning-basics/ (where many of the settings I want to use come from)
- https://www.enterprisedb.com/postgres-tutorials/introduction-postgresql-performance-tuning-and-optimization
Where we're starting
Perhaps surprisingly, our autovacuum settings today are the defaults. My understanding is that this is not recommended as the defaults are conservative for larger, busier servers.
Using the defaults can probably be explained by us having run Postgres for many years and not having updated our setup to take advantage of improvements to autovacuum in newer versions.
One improvement in particular is relevant to us: We have some insert-mostly tables which were not handled well in older versions of Postgres. This changed in Postgres 13 when autovacuum began triggering vacuums based on the number of inserts in addition to based on the number dead tuples.
There have also been changes to autovacuum's defaults that made it more aggressive:
autovacuum_vacuum_cost_delay
decreased from20ms
to2ms
in Postgres 12. This made autovacuum work more frequently, increasing its speed.vacuum_cost_page_miss
decreased from10
to2
in Postgres 13. This allowed autovacuum to do more work each time it runs.
Prior to these changes, these insert-mostly tables would not get vacuumed at all until a vacuum to prevent wraparound was triggered. That would find a lot of work to do and it would take a long time (due to the slow, conservative settings). This ended up blocking deployment of DDL changes (and potentially could have resulted in worse issues).
To work around the lack of vacuum from inserts, we began running
flexible-freeze each night.
flexible-freeze
runs VACUUM (FREEZE)
against tables needing it most.
This means in general we haven't needed autovacuum as flexible-freeze
kept things in check. This is the likely the main reason we haven't needed
to tune things, although I suspect we've also benefitted from the other
changes I mentioned.
We don't have any particular issues with our setup, but there are benefits to tuning autovacuum:
- If we vacuum more often, dead tuples can be cleaned up more often, potentially avoiding bloat. Tables can be kept in a better state rather than needing a heavy cleanup during the night.
- In order for Index-Only scans to be possible, a table must be vacuumed, so vacuuming more often enables more Index-Only scans.
- We can stop running
flexible-freeze
. In the past, we've had issues with it (even an outage). Though we have no issues with it today, it's always possible something will crop up. It's a moving part we shouldn't need.
(I was initially thinking that we'd also get statistics updated more often, but I think that would mean tuning autovacuum's analyze settings).
The plan
Based on my reading, my thinking is to adjust autovacuum such that it triggers more often and to allow it to work faster when it runs. Ideally it will trigger often but not have much to do.
The settings I'm thinking of are:
autovacuum_vacuum_scale_factor = 0.01
(default0.2
)- This means that autovacuum triggers when 1% of a table's rows are dead
as opposed to 20%. The exact formula is from the Postgres
docs:
Trigger when the number of dead tuples is larger than
autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * pg_class.reltuples
.
- This means that autovacuum triggers when 1% of a table's rows are dead
as opposed to 20%. The exact formula is from the Postgres
docs:
Trigger when the number of dead tuples is larger than
autovacuum_vacuum_insert_scale_factor = 0.01
(default0.2
)- Similar to the above except it says to trigger when the number of new tuples since the last vacuum exceeds 1% of the table's row count.
autovacuum_vacuum_cost_limit = 500
(default200
)- This controls how much work autovacuum does before pausing. Increasing
this to
500
effectively means increasing autovacuum's speed by 2.5x.
- This controls how much work autovacuum does before pausing. Increasing
this to
autovacuum_max_workers = 30% of CPU core count
(default3
)- This controls how many separate workers can run at once. All workers share the cost limit setting. The reason to increase this is to allow more tables to be processed in parallel.
Reducing the scale factors means autovacuum triggers sooner and has fewer tuples to work on. Increasing the cost limit means it runs faster when triggered.
These are the system wide settings I'm thinking of using. I'm also planning on applying some settings on particular tables which I'll discuss below.
A digression: Autovacuum's speed
How do we calculate autovacuum's maximum speed? autovacuum has a cost
budget of autovacuum_vacuum_cost_limit
(default 200
, new value 500
)
when it runs. After using up that budget it sleeps for
autovacuum_vacuum_cost_delay
(2ms
). With these we can calculate its
maximum bytes per second:
bytes per second = runs per second * bytes per run
where
runs per second = 1 second / autovacuum_vacuum_cost_delay = 1000 ms / 2 ms = 500
and
bytes per run = pages per run * page size = (autovacuum_vacuum_cost_limit / cost per page) * page size = (200 / cost per page) * 8192
The cost per page varies depending on what is done with a page:
vacuum_cost_page_hit = 1
(read, cached)vacuum_cost_page_miss = 2
(read, uncached)vacuum_cost_page_dirty = 20
(write)
We can calculate the maximum bytes per second for the different page
operations. With the default autovacuum_vacuum_cost_limit = 200
we have:
- Cached reads: 781 MiB/s (from 500 * (200 / 1) * 8192 / 1024 / 1024)
- Uncached reads: 391 MiB/s (from 500 * (200 / 2) * 8192 / 1024 / 1024)
- Writes: 39 MiB/s (from 500 * (200 / 20) * 8192 / 1024 / 1024)
Upping autovacuum_vacuum_cost_limit
to 500
gives us:
- Cached reads: 1953 MiB/s
- Uncached reads: 977 MiB/s
- Writes: 98 MiB/s
A digression: Why autovacuum triggers
In order to understand how to tune autovacuum, we need to know the circumstances under which it triggers vacuums. There are three ways:
- The number of dead tuples in a table goes beyond a certain ratio (see
autovacuum_vacuum_scale_factor
above) - The number of new tuples in a table goes beyond a certain ratio (see
autovacuum_vacuum_insert_scale_factor
above) - The table has tuples that are old and need to be frozen to prevent wraparound
The first two are relatively straightforward to understand: Activity on a table leads to autovacuum having work to do on that table.
The third is interesting as it does not depend on activity on the table. As well, this kind of autovacuum is more aggressive in that won't stop if there is a query wanting a conflicting lock. It is considered more important to continue. We need to consider tuning for this case.
Per table settings
Depending on a table's workload, it can make sense to change settings on the table to better handle vacuums freezing tuples for wraparound protection.
Consider insert-only or insert-mostly tables:
- In many cases, there will be two passes of autovacuum for a tuple
- After enough new tuples are inserted, autovacuum triggers. It will look at the new tuples and do things like update the visibility map. Later, autovacuum triggers again to freeze these tuples (though some may have been frozen by earlier, normal vacuums). This means processing these tuples twice. Typically for these kinds of workloads we don't need to do that since the tuples likely have not changed since the first vacuum. We could benefit from doing everything in one pass.
- Autovacuum for wraparound protection blocks DDL
- The second autovacuum pass will block conflicting queries, which could happen at inopportune times (such as when you want to apply DDL during a deployment). This is relevant for all tables, but for these kinds of tables in particular it can be a problem as they both may be large and most of their tuples might not be frozen, potentially leading to longer vacuums of this type.
It would be nice if we could make it so that when autovacuum triggers for new tuples it would freeze them as well. That would make it such that we don't need two vacuum passes on the tuples. It would also avoid needing to do anything during wraparound protection vacuums.
Fortunately there are ways to do this. We can apply these settings to the tables
via ALTER TABLE ... SET (...)
:
autovacuum_freeze_min_age = 0
autovacuum_freeze_table_age = 0
autovacuum_multixact_freeze_min_age = 0
autovacuum_multixact_freeze_table_age = 0
With these, autovacuum will be able to freeze the tuples immediately when
it vacuums them due to inserts. No second pass will happen as the tuples
are already frozen. (Running with the vacuum_freeze_min_age = 0
and
vacuum_freeze_table_age = 0
is what VACUUM (FREEZE)
does).
Setting these to 0
globally probably doesn't make sense: Most tables are
probably not insert-only/-mostly. Freezing tuples immediately would be
wasteful if they might get updated or deleted.
Applying these settings is key to us being able to stop using
flexible-freeze
as right now that is handling a lot of the freezing for
us.
This is my thinking anyway. I am not yet at the stage of doing this. A
thought is whether this would interact badly with already executing
queries. Would these tuples become visible to them given we are saying they
are visible to all transactions? I suspect Postgres protects us in some
way, especially given VACUUM (FREEZE)
does this, but I'm not sure how. It
might be that locking protects us. If it's locking, then avoiding
aggressive freezing this way entirely might not work (as I expect these
vacuums would not always be able to freeze the tuples, leading to later
aggressive vacuums regardless). Thanks to my colleague Marsel for pointing
this part out!
Transitioning to the new settings
One thing that occurred to me while thinking about how to test and deploy
these changes was what will happen when we switch the settings. In
particular, when we change the scale factors, autovacuum will immediately
find many tables need vacuuming all at once. This is because it was keeping
tables vacuumed down to the old scale factor, but many could be well beyond
the new scale factors (even with flexible-freeze
, plus that doesn't run
against all of our databases).
This made me concerned that we would see high I/O for an extended period of time, potentially to an extent that services would be impacted.
As well, the autovacuum workers could be stuck processing large tables for a long time and be unable to process other tables that would otherwise be vacuumed without issue. We could end up with bloat. (Part of this concern comes from us not planning to immediately restart the servers to apply the new workers setting).
My thinking is to reduce the scale factor settings slowly over time and
monitor. For example, instead of going from 0.2
to 0.01
in one shot, we
will first go to 0.17
, let autovacuum get tables stable at that level,
and then reduce again, e.g. to 0.14
. Eventually we will reach 0.01
(or
see that autovacuum is working too often and adjust our plan).
This will take longer to get the new settings enabled, but I think it will be a safer transition.
We also have to consider when to stop running flexible-freeze
. I am
thinking that we will continue running it for a period, likely weeks, until
we see that autovacuum is freezing new tuples as expected in the large
insert-mostly tables. It should never need to do much work on those tables,
which we'll be able to confirm by looking at its logs. Turning it off will
mean autovacuum will likely need to periodically run vacuums to prevent
wraparound on tables where we haven't changed the freeze settings, but that
should be okay (especially for smaller tables), and is already happening in
several of our databases.
Monitoring autovacuum
Part of making any change like this is knowing how to monitor it. We can look at the state of autovacuum per table by looking at the values for each way autovacuum triggers. Before making the change, I wanted to see how well autovacuum is already keeping up. I also want to see how the state changes as we change the settings.
Monitoring dead tuples
For vacuums triggered by dead tuples, we can look at how many tuples we'd need to work on at varying scale factors:
SELECT c.relname, n_dead_tup, reltuples FROM pg_stat_all_tables s JOIN pg_class c ON oid = relid WHERE n_dead_tup > 50 + reltuples * 0.01;
(The 50
is autovacuum_vacuum_threshold
).
I happen to have this data in Prometheus too. A query like this shows it there:
pg_stat_user_tables_n_dead_tup{instance_host="[...]"}
and (pg_stat_user_tables_n_dead_tup > 50 + pg_class_reltuples{} * 0.01)
If we use the current scaling factor, 0.2
, we can see if autovacuum is
currently keeping up, and by looking at the how number of tuples changes as
we decrease it, we can see how much autovacuum will need to do when we
change the value.
Monitoring inserted tuples
We can run similar queries for vacuums triggered by inserts:
SELECT c.relname, n_ins_since_vacuum, reltuples FROM pg_stat_all_tables s JOIN pg_class c ON oid = relid WHERE n_ins_since_vacuum > 1000 + reltuples * 0.01;
(The 1000
is autovacuum_vacuum_insert_threshold
).
And for Prometheus:
pg_stat_user_tables_n_ins_since_vacuum{instance_host="[...]"}
and (pg_stat_user_tables_n_ins_since_vacuum > 1000 + pg_class_reltuples{} * 0.01)
Monitoring for transaction wraparound prevention
This one is trickier to monitor. We can look at how long ago the table was last vacuumed to prevent this. However I haven't found a way to look at how many tuples actually need freezing (other than I expect I could do so by inspecting individual pages).
This query will show a table might soon need to be vacuumed for this reason, but it won't show how much freezing work needs to be done. Note all of a table's tuples could already be frozen and effectively no work will be done, but the table will still show in this query.
SELECT c.relname, age(relfrozenxid), mxid_age(relminmxid) FROM pg_stat_all_tables s JOIN pg_class c ON oid = relid ORDER BY 2 DESC, 3 DESC;
A digression: Vacuum taking an access exclusive lock
One potential gotcha related to tuning autovacuum to run more often is that vacuums (including those run by autovacuum) can take access exclusive locks that can cause issues if you are running hot standbys. My current thinking is that this behaviour should be disabled when using hot standbys.
The end
A further thing we could do is stress test autovacuum. How high could we set the cost limit before it begins to cause issues?
However, with the above plan, I feel good about making the change. Especially with our decision to reduce the scale factors slowly, I don't expect there to be issues.