Setting up Sequelize Migrations Mid-Project (for Node.js / Express.js)

24 June 2014

Recently, my friends and I were hacking on a small project using Express.js, a node.js web application framework. For our database, we wanted something relational that was easy to set up on production, so we went with Sequelize to interact with our Heroku PostgreSQL database.

Sequelize offers a lot of cool advantages. It is easy to set up, models are relatively simple to declare, and the documentation isn't bad. One of the most interesting advantages, at least early on, is that Sequelize will sync your database's tables for you as you change your models. It's a great tool to get up and running really quickly.

We chugged along for a little bit, and had some data accruing in our production database. There came a time, however, when we realized we had a problem:

Sequelize's sync erases existing data from tables it affects. The sync essentially erases any affected tables and creates them from scratch. (This happens if you pass in force: true, but if you don't force it, it won't always sync changes for you.)

This was a problem for us because we didn't want to lose data every time we wanted to change a model. So, we had to figure out a way to implement traditional migrations in Sequelize. We also didn't want to lose any of the data we already had (although it wasn't much). That meant we had to figure out how to switch Sequelize over to migrations and trick it into thinking that certain migrations had already happened. And on top of all that, we needed to create the "initial" migration that would create the database we all had at that point in time - this would be necessary in case any new person wanted to contribute to our project.

The Sequelize documentation covers migrations here, but it doesn't provide enough specifics for this particular situation. I volunteered to tackle the problem, and here's what I did.

What is a migration?

A migration is a way to change the structure of a database, which can involve changing existing tables, removing existing tables, or adding new ones. By documenting exactly how a database changes from state 1 to state 2, a migration also allows different members on the same team to arrive at the same database structure, even if people are working in parallel on different features.

Migrations involve two things: 1. Specific migration files containing SQL commands (or code that can lead to SQL commands) dictating how the database is changing (and the commands to reverse that change, so that things can be rolled back if need be). 2. Some way of keeping track of which migrations have been applied to a database already. For example, there can be a specific table in the database where each row is a migration file that was already applied.

Mimicking the current state

My process was inspired by Bulkan Evcimen, who blogged his solution here. There were a couple of modifications for our project.

First, I wanted to create an "initial" migration file that would bring somebody from a newly-created, empty database to the current state of our database. Luckily, this is not too hard with a database dump. The dump command generates all the SQL needed to create an exact copy of the current database. For our PostgreSQL database, I did this:

pg_dump -O -s -x boxplot | egrep -v "(^SET|^/\*\!)" > mypostgres.sql

So I'm calling pgdump on my local boxplot database. -O removes any commands that sets ownership of the tables to users (not needed), -s means that the dump will only have the commands to create the schema of the table, not the data, and -x removes access privilege-related commands. The documentation of `pgdump` is found here.

The dump code from this is sent into egrep, which removes lines starting with SET, which are more commands you don't need here. This final output is saved to the mypostgres.sql file.

The only bit of processing I had to do after that was to remove the SQL comments (which will interfere with our next step).

This file serves as the basis for our first real migration in the project. Sequelize comes with some commands to handle parts of the migration. sequelize --init creates the migration folder and the config.json file, which contains credential and identification information for your database(s). mypostgres.sql went into the newly created migrations folder.

Then, I ran sequelize -c initial to create a migration file named "initial". First thing to notice is that the file is named with a timestamp before the name given. With this base migration file, which has no commands in either the 'up' or 'down' migration directions, I tried out the migration command sequelize -m. This detected the initial migration file and ran it without a problem. (Whew!)

Next step was to create a migration file corresponding to my database dump. I created a new migration file with sequelize -c [name], and then shamelessly used Bulkan's code and just debugged any errors. I had to add pg = require('pg'); at the top, and of course change the name of the file referenced in the code to be the name of my dump file. Basically, his code asynchronously reads in the dump file, splits it on the semicolons (which marks the end of a SQL command), and runs each of those commands.

At this point, we have two migration files. Any new coder on our project could've run sequelize -m after installing everything, and the result would've been a database that looked like ours.

Tricking Sequelize to ignore the first two migrations

But, having these two migration files posed a problem for those of us who already had a database and for our production database. We couldn't run those two migrations because the schema already existed! So I had to figure out a way to trick Sequelize into skipping these two migrations.

This ended up being fairly simple. Sequelize stores the history of migrations in a table named SequelizeMeta. It's a fairly simple table that just stores a "from" timestamp and a "to" timestamp, with entries corresponding to migrations that have been run already.

All I had to do was to create this table and dump in the correct data to make Sequelize think that it had already run our two files. Here are the SQL commands for that:

CREATE TABLE "SequelizeMeta" (
    id integer NOT NULL,
    "from" character varying(255),
    "to" character varying(255)
);

CREATE SEQUENCE "SequelizeMeta_id_seq"
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;

ALTER SEQUENCE "SequelizeMeta_id_seq" OWNED BY "SequelizeMeta".id;

ALTER TABLE ONLY "SequelizeMeta" ALTER COLUMN id SET DEFAULT nextval('"SequelizeMeta_id_seq"'::regclass);

INSERT INTO "SequelizeMeta" (id, "from", "to") VALUES
(1, 20140526155106, 20140526155106),
(2, 20140526155106, 20140526155137);

SELECT setval('"SequelizeMeta_id_seq"', 2, true);

ALTER TABLE ONLY "SequelizeMeta"
    ADD CONSTRAINT "SequelizeMeta_pkey" PRIMARY KEY (id);

Note that in the INSERT INTO statement are the entries corresponding to my migration files. My first, initial migration file had the timestamp 20140526155106 in the name, and my second (the one that did all the heavy lifting) was 20140526155137.

At this point, I could confirm that Sequelize was tricked by trying to run sequelize -m. This command tried looking for migrations it hadn't done yet, but didn't find any, and thus left my previously-correct database alone.


Overall this was a fun little exercise that helped me solidify my understanding of migrations. Coming from Python and Django, Express and Node were completely different paradigms for me to wrap my head around, but I was happy I could leverage my existing knowledge of SQL databases.


comments powered by Disqus