Changing owners in PostgreSQL
I recently needed to mass change ownership in a Postgres database. There were two users, $old1 and $old2, who owned objects in the database. I needed to make everything in the database owned by a single user. This sounds like it should be simple, and in theory it is. However there were complications.
The approach I settled on was changing ownership of everything to the user $new. This had the additional benefit that I could pick a more suitable name as the old usernames were not ideal.
I needed to run code that did something like this:
- For every object owned by $old1 or $old2
- Make $new the owner
Postgres has a command to do exactly that: REASSIGN
In theory, I could simply say
REASSIGN OWNED BY $user TO $new for both
$old1 and $old2 and I'd be done.
Unfortunately it was not as easy as that.
Problem: "ERROR: tuple concurrently updated"
The first problem was the project's tests. As part of its setup phase, a
test creates a clean database. This involves running each database
migration. The tests run concurrently, so they ran
Postgres normally acquires appropriate locks for each statement. However, this is not currently the case for system catalogs. As there was no lock, tests raced to change the same system catalog rows, and Postgres complained. Note that while tests create their own databases, they share the same database cluster. System catalogs are cluster wide rather than per database.
Since the problem was that we were not locking, I tried taking a lock
manually at the start of my migration:
LOCK pg_class. This resolved the
errors, but caused the time to run the tests to blow up.
Next I wondered if I could catch and ignore these errors. In theory the race did not matter as it was a race to set identical values. However, this was not feasible because the errors could occur when running any command that touches these system catalog tables and not just inside my migration. For example, creating a database or a table could encounter it. Regardless, it may not be safe to assume ignoring it would leave us in an expected state.
I resolved this problem by updating the base schema used by the tests. This meant the tests would never run the migration. They would start out with ownership in the correct state.
Once I had the migration working in the tests, I tested it with other actions occurring in the system at the same time. This revealed another problem:
process 9157 detected deadlock while waiting for AccessExclusiveLock on relation 1242620572 of database 16401 after 1000.108 ms.
REASSIGN OWNED acquires
ACCESS EXCLUSIVE locks on all objects it needs
to change ownership on. Since the old users owned a large number of
objects, this meant
REASSIGN OWNED had to acquire locks on many objects
at the same time.
In theory I could have the command retry until it succeeded. However, Postgres's deadlock resolution kills one of the queries causing the lock. I didn't want to risk killing important queries.
I wondered whether I could synchronize access. The query that interfered with the migration was a PL/pgSQL function, so I thought I could update its caller to take an advisory lock before running the function. However, I realized many transactions in the system could trigger this situation. There was no way it would be feasible to add enough advisory locks.
REASSIGN OWNED was unworkable for my purposes. I needed to do
what it did, but take as few locks as possible at any one time.
ALTER ... OWNER TO ...
Each type of object (with some exceptions) has a command that you can run
to change its ownership. For tables, it's
ALTER TABLE $table OWNER TO
I could do something like this:
- For every type T in the database
- Get a list of every object of type T in the database
- For each of these objects O
- Run the appropriate
ALTER $T $O OWNER TO $newcommand
- Run the appropriate
This would be many more statements than
REASSIGN OWNED, but it seemed
In order to write this, I needed to figure out:
- What types do I need to change?
- How do I list objects of each type?
- How do I change ownership of objects of each type?
1. What types do I need to change?
DROP ROLE $role causes Postgres to raise an error telling you
what objects, including their type, the role owns. For example:
mydb=# DROP ROLE myrole; ERROR: role "myrole" cannot be dropped because some objects depend on it DETAIL: owner of sequence mytable_id_seq owner of table mytable owner of sequence mytable_id_seq
I repeatedly ran this command for each of the old users until I had accounted for every type in my database. This meant I only worried about types in use in my database rather than trying to handle every possible type.
2. How do I list objects of each type?
The system catalogs are the source of truth. The information schema is a set of views into the system catalogs and is often easier to use. However, the information schema does not provide a view for every object.
I first tried to query for what I needed using the information schema. If I could not get the information that way, I went to the system catalogs and got it there. Eventually I found how to query for each type of object.
3. How do I change ownership of objects of each type?
In general you can look up the
ALTER command for a particular type and it
has a corresponding
ALTER ... OWNER TO ... form. Unfortunately, this is
not the case for every type of object. For example, there is
ALTER EXTENSION ... OWNER TO ... form.
For cases without an appropriate
ALTER command, I had to
system catalogs directly. This is a little hair raising, but the system
catalog documentation is quite good.
I ended up with a complicated migration, but it did the job. Core to the migration was avoiding using a transaction if it was not required. By avoiding transactions, each statement takes the minimum number of locks. The only explicit transactions I used were for objects that required touching the system catalogs as I found I needed to touch at least two catalog tables for each of those.
Problem: The lock queue
Unfortunately, testing revealed there was still a problem with locks.
Remember this migration had to run on an active system. The system
routinely issues queries such as
SELECT queries can
run along side each other as they take
ACCESS SHARE locks which do not
conflict. However, my migration generally had to take an
lock on each object, and this lock conflicts with every other lock. Once
such a lock enters the queue for an object, other new queries must wait.
Consider this situation:
SELECTis running against a table and it's taking a while. Normally other
SELECTqueries could come and go.
- My migration issues the
ALTER TABLEcommand for the same table. It can't proceed because of the long running
SELECTquery. It sits waiting to acquire an
ACCESS EXCLUSIVElock on the table.
- The system issues a new
SELECTquery against the same table. It can't proceed. Instead, it queues up behind the
This could easily block crucial time sensitive actions in the system.
Limiting how long we wait on a lock and retrying
I wanted to block other queries as little as possible. It didn't matter if my migration took a little longer to run.
Instead of waiting to acquire a lock for an unbounded amount of time, I wanted to time out. You can do this for every query coming from your session using a setting:
SET lock_timeout TO 50
This meant my migration's commands would block other commands for at most
50 milliseconds. If a command waits 50 milliseconds, it aborts with an
ERROR: canceling statement due to lock timeout.
I updated my migration to retry any command that saw this error. It retried until it succeeded.
In the end the migration ran smoothly. It did have to retry several times after hitting the lock timeout, but that was fine.
This turned into more of a project than I expected when I started. I think it shows a lot of important aspects about using PostgreSQL though. It also shows that things that look simple are often not.
Tags: PostgreSQL, SQL, programming