Agile Release Pattern: Database migrations

September 2, 2010

As I release more frequently, I start to focus on automating the actual process of deploying a release. One of the most powerful steps of automating deployment is to automatically upgrade the database schema.

This technique first saw mainstream use with the Ruby-on-Rails framework. Today, there are several mature tools that will help you organize and execute database changes (Scala Migrations, Ruby-on-Rails Migrations, dbdeploy, Liquibase, dbmaintain). And if none fit you perfectly, it’s easy to create your own.

In my current project, we have rolled our own solutions for this:

All changes to the database are stored as SQL-files that are packaged into the deployment unit (in our case, a WAR-file). These files will usually contain statements like “ALTER TABLE ADD COLUMN” and “CREATE TABLE”. To get the files executed in the right order, we name the files with an increasing sequence number, like 012-add_payment_type_to_customer.sql.
Whenever the application is started, it looks for a table named “MIGRATIONS” in the database and creates it if it doesn’t exist.
At startup, the application looks through the list of migration files it has been packaged with and sees which file names don’t have an entry in the MIGRATIONS table.
The application executes all the scripts that haven’t been executed already. If any script fails to execute, it makes a note of the error in the MIGRATIONS table and refuses to start the application

We run the migration procedure every time we start up the application, whether it is in test or production. Even the JUnit tests that access the database will run any pending migrations before starting. The result is that any database change that we intent to roll out into production will at the very least be executed once on each developers private copy of the database, as well as once on the continuous integration server. By the time they get executed in a controlled testing environment, we’re pretty confident that they work as intended.

Some migration tools use a more friendly (and portable) syntax than SQL DDL statements. Many allow for rollback of migrations. Most don’t automatically execute the pending migrations on application start, but require a separate command to execute them.

Your first step towards automating database migrations is to make sure that every change in the database is represented by some sort of script and that all these scripts are versioned with the rest of your code. From there, you can improve your process when you notice a step in the process that seems to involve too much work or risk.

Automating the deployment process will reduce the need for documentation and the opportunity for errors during one of the most critical times in the project. It is especially important to reduce the possibility of miscommunication and mistyping if the people responsible for deployment are in a separate organizational unit, which often seems to be the case. Make their job as easy as possible!

Comments:

[undefined] - Sep 2, 2010

First of all: I really appreciate these little tidbits of wisdom, keep up the good work!

The use of sequence numbers will help you run the migrations in the correct order. However, two developers could create migrations using the same sequence numbers. Rails moved to using datetime-strings instead of plain sequence numbers to prevent this. The migrations table will keep track of the datetimes instead of sequence numbers, and sorting the migration files will work just as well with dates as with sequence numbers.

So, instead of having - 021-add_last_name_to_customers - 021-add_shipped_at_to_orders

you’d have - 201009021200-add_last_name_to_customers - 201009020930-add_shipped_at_to_orders

[Trond Arve] - Sep 3, 2010

How do you update large volumes of production data?

[Filip Neven] - Sep 2, 2010

You should also mention dbmaintain, which is more powerful version of dbdeploy, but simpler to use than liquibase.

Geir Hedemark - Sep 3, 2010

Good post!

One thing: Check with your hosting provider. Your application may not run with enough privileges to do this. You want to handle the red tape up front, not later.

Johannes Brodwall - Sep 5, 2010

Thanks, Filip. I’ve added dbmaintain to the list of migration tools.

Johannes Brodwall - Sep 5, 2010

Hi, Trond Arve

You options with large (huge) volumes of production data are more limited. The short answer is that when the volumes are sufficiently large, you have no choice but splitting the update into several steps. Before all the steps are executed, the database will be in a halfway migrated state.

A very simple can illustrate: Lets say you want to convert a numeric field to a string field 1. Add the string column 2. Execute a long running batch job to update all rows 3. When all the data is converted, upgrade all applications to read the string field 4. Drop the numeric column (for some migrations, this step may also be time consuming, but that’s okay at this step)

An alternative to executing a long running batch job is to “migrate on read”. The application makes the necessary update to the data as it touches it. This strategy will leave potentially the database in a halfway state much longer.

Leif John Korshavn - Sep 5, 2010

Fascinating approach!

This is - as you state yourself - a critical time in the project. Have you maybe also put in an automated backup and fail-and-alarm like in: 000-make-backup.sql … 999-check-consistency-after-migration.sql

Johannes Brodwall - Sep 5, 2010

Backup and consistency may be a good idea, especially for migrations that change the contents of the database a lot. I’ve never heard it used, though.