Over time, a Python project accumulates many migrations using alembic
. This can slow down tests (you need to run all migrations every time you create a test DB) and it’s very unlikely you will ever rollback past a certain point in time.
There’s no straightforward way of squashing migrations into one that I could find in alembic
. I found that I couldn’t use the trick of running alembic revision --autogenerate
because we don’t use ORM models to represent the entire schema. Instead, the combination of pg_dump
and alembic stamp
gives a similar result that allows us to squash everything into one migration.
Get the schema of DB as SQL
First we need to get the schema as a set of SQL statements using pg_dump
. This needs to be done on the database that matches the actual schema in production (or maybe run it on production). Unless you are meticulous about using DB migrations, there might be some drift between the schema in your local database and what’s actually in production.
pg_dump -s --user {your app user} {name of db} > squash.sql
Verify and clean up
For my purposes, I removed the auto-generated SET
statements, removed the auto-generated ownership statements (otherwise test DBs run using a different user would fail), and tried loading it in a clean DB to see if the tables match.
Create a migration and wrap the generated SQL statements
Generate a new migration that runs the generated pg_dump
SQL.
The migration should look something like this:
# revision identifiers, used by Alembic.
revision = "123456789"
down_revision = None
branch_labels = None
depends_on = None
def upgrade():
with open(f"{PROJECT_ROOT_DIR}/alembic/versions/squash.sql", "r") as f:
for stmt in f.read().split(";\n"):
if not stmt:
continue
op.execute(stmt)
Delete all previous migrations
You can now delete all of the old migrations and try out the new squashed migration to see that it takes a clean DB and produces the expected schema using alembic upgrade head
.
Note: check very closely that the squashed migration actually gets your DB into the shape you want. For me, I needed to add some extra calls to pre-populate some data that were snuck in from previous migrations using handwritten INSERT
statements.
Skip the migration in production
[WIP I haven’t verified this yet!]
You don’t want to run the squashed migration in production. To skip it, run alembic stamp head
which will mark the latest migration as the lastest version. New migrations can be applied on top safely using the usual alembic revision
commands.