A DIFFerent, Better Approach to Database Migrations in PHP

I've been giving a lot of thought lately to how I don't particularly like how most PHP frameworks/tools/ORMs handle migrations and how I think we, as a community, can do better.

This article assumes we're all on the same page as to why we should be using migrations. This article is to dissect the how.

The Current State

I tried to research as many different migration tools as I could before writing this article. I even pulled down code for a few of them to get a more hands-on feel for some of them.

Full Disclosure: I haven't used very many of these in a real-world application, and some I haven't used at all, just perused the docs. Please let me know if I got something wrong.

Four of the six migration tools on the first page of a Google Search for php database migrations all have similar approaches (using Phinx as an example, as it's mentioned four times on the first page and it's framework/ORM agnostic):

<?php

use Phinx\Migration\AbstractMigration;

class CreateUserLoginsTable extends AbstractMigration
{
    /**
     * Migrate Up.
     */
    public function up()
    {
        // create the table
        $table = $this->table('user_logins');
        $table->addColumn('user_id', 'integer')
              ->addColumn('created', 'datetime')
              ->create();
    }

    /**
     * Migrate Down.
     */
    public function down()
    {
		$this->dropTable('user_logins');
    }
}

This is a incremental snapshot of what you want to have done in the database. If at a later time, you need to add an updated field to the database, you must create a new migration:

<?php

use Phinx\Migration\AbstractMigration;

class AddUpdatedColumnToUserLoginsTable extends AbstractMigration
{
    /**
     * Migrate Up.
     */
    public function up()
    {
        $table = $this->table('user_logins');
        $table->addColumn('updated', 'datetime')
              ->save();
    }

    /**
     * Migrate Down.
     */
    public function down()
    {
        $table = $this->table('user_logins');
		$table->removeColumn('updated')
              ->save();
    }
}

The syntax is different, but for the most part, the approach is the same.

From here, you just run some sort of command to run the pending up commands, and if there is an issue, you can rollback which runs the down commands. The idea here is that you commit these migrations with your code, and if so if you pull down any particular commit of the code, you can either move forward or back, going through each migration in order, to arrive at the state the database was in at that particular commit.

What's Wrong with this Approach?

First of all, It's so very tedious to make minor changes to your database. If you're still in development and haven't pushed your code, you can just rollback, make a change to the migration, and then re-run the migration. This is a bad idea to do if you've already pushed the migration, especially if you've already pushed it to production. Changing migrations after the fact can (and often do) cause systems that have already run those migrations to not pick up those changes and thus the database is no longer in sync.

For this second reason, I feel that it's so obvious that this is an issue, it's easily overlooked—including me—until the glass was shattered by looking at how other languages do it: no where in the code is there an absolute list of the fields on the models/fields in the database.

Okay, okay, I guess that point really does depend on the code you're using, i.e. the framework may or may not have a way to support this. In my research, I found Laravel to be the most popular PHP framework this year. Laravel does not use property names on the model, and thus you can't really go to the model to see what fields belong there. On the other hand, Phalcon, another popular PHP framework, does. This is less of an issue in frameworks Phalcon, but Phalcon doesn't support the "easier way" of doing migrations that I will discuss later.

Other Contenders

Doctrine and Propel are both under the Symfony umbrella, and that may explain why they have some similar functionality. They both allow for a configuration file definition of models (Doctrine actually has a bunch of different ways you can define the models).

I find this approach to be a lot better, but the way these ORMs do it still have their issues. Let's first explain how they approach it:

As an example (taken from the Propel docs), you would create an XML file:

<database name="bookstore" defaultIdMethod="native">
  <table name="book" description="Book Table">
    <column name="id" type="integer" primaryKey="true" autoIncrement="true" />
    <column name="title" type="varchar" required="true" primaryString="true" />
    <column name="isbn" required="true" type="varchar" size="24" phpName="ISBN" />
    <column name="author_id" type="integer" />
    <foreign-key foreignTable="author" onDelete="setnull" onUpdate="cascade">
      <reference local="author_id" foreign="id" />
    </foreign-key>
  </table>
  <table name="author">
    <column name="id" type="integer" primaryKey="true" autoIncrement="true" />
    <column name="first_name" type="varchar" />
    <column name="last_name" type="varchar" />
  </table>
</database>

You would then execute a diff command, that will automatically create a migration for you (both Doctrine and Propel create code that executes raw-SQL migrations on the high level, i.e. DB::exec('CREATE table ...');).

It looks at the current schema in the database and creates a diff based on the current state of the XML config (you would then run that migration to actually change the database). The first time you run that diff command, it would create the table and all the columns. Then, if you want to add another column after the fact, you edit the schema XML, adding the necessarily XML, i.e. in the author table node in the example, you could add

<column name="middle_name" type="varchar" />

Then you run another diff command, and it would create and up and down migration that would, respectively, add and remove the middle_name column to the authors table.

The Comparison

I find this 2nd approach to be vastly more simple. You only have to change the schema in one place, and the tedious part is done for your automagically. You have an absolute definition of the schema (which maps to the models) for quick reference.

The only downside I can think of: you could only have one pending migration at a time. You'd have to apply the last migration before you could do another diff, otherwise the diff wouldn't work, because it'd be based off an old version of the schema. You could get around it with comparing the schema that would be if you ran the first migration before creating a second.

I don't find this necessary. If you've pushed, you should have executing that migration anyway. If you haven't pushed, you can just undo that migration and create a new one. i.e. instead of having 2 migrations: A -> B -> C, you just have one: A -> C.

Now, while I think the approach is superior, the only two stable, in-common-use migration tools I could find were wrapped behind huge ORMs. I've played with Doctrine migrations a lot, but a ton of the configuration defines behavior of the model that has nothing to do with the migrations, but with how Doctrine behaves. Basically: it's tied too tightly to the Doctrine ORM and isn't appropriate to use with other ORMs/frameworks.

The Solution: A Proposal

The solution is to start with an existing tool that can handle all the abstraction away to make migrations easier to handle. That is, I think Phinx is the perfect place to start. Rather than having to re-invent the wheel and figure out how to write raw SQL to support different database vendors, we take what we know does work as a starting point.

Phinx is also ORM/framework agnostic, and building a tool to extend from Phinx would wisely also be ORM/framework agnostic, and should be able to replace another ORM/framework migration approach.

From there, we support schema definitions like Propel. But we don't only offer XML. We offer basically any transactional data format that can be mapped to a PHP array that would look something like this:

[
	'table' => 'users',
    'columns' => [
    	'name' => [
        	'type' => 'string',
            'length' => 100,
        ],
        'email' => [
        	'type' => 'string',
        ]
        ...
    ]
]

It'd be easy to support XML, YAML, JSON, obviously a PHP array.

If you were to run a diff on this schema for the first time, you'd end up with this migration in Phinx, automatically generated:

<?php

use Phinx\Migration\AbstractMigration;

class AutoCreateUserTable extends AbstractMigration
{
    /**
     * Migrate Up.
     */
    public function up()
    {
    	// IDs are automatically added in Phinx
        // (IDs can be changed/overridden)
        $table = $this->table('users');
        $table->addColumn('name', 'string', ['length' => 100])
              ->addColumn('email', 'string')
              ->create();
    }

    /**
     * Migrate Down.
     */
    public function down()
    {
		$this->dropTable('users');
    }
}

Bam. Best of both worlds. We still have incremental migrations, but it's done automatically. Before committing this migration, we could make tweaks (just as long as the final schema is the same). For example, if the auto-migration generated dropped and added a column when what you wanted was to rename the column, you could tweak the migration to rename the column instead. Just as long as the final state of the database is same as the current schema.

Or, better yet, the accompanying command-line tool would detect those types of discrepancies, and ask you if you want to drop/add or rename. (This may be a v2 type of thing.)

As a last thought, there's something else I want to add... I know this will be controversial, I strongly believe it should support PHP annotations. Which don't technically exist, I know (I think they should. Maybe a blog post for another day). While I think PHPdoc annotations aren't great... I think it's worth the "not so greatness."

The rationale is that if you have a framework that has the field names in the models like Phalcon, you wouldn't want to have to update your schema in two places (the schema definition and the model definition). For example:

<?php namespace App;

use \Phalcon\Mvc\Model;

/**
 * @Table('Users')
 */
class Users extends Model
{
	/**
	 * @Type('integer')
     * @Signed(false)
     * @PrimaryKey
	 */
    public $id;

	/**
	 * @Type('string')
     * @Length(100)
	 */
    public $name;

	/**
	 * @Type('string')
	 */
    public $email;
}

This way, you still have your schema and your model definition and no redundant code.

Conclusion

I hope you see the potential as I do. I'm okay if you think this idea is stupid. You don't have to use it. I obviously think you should. I think it greatly simpliefies the process and lowers the barrier of entry. As far as I can find, this would be the first and only not-tied-to-an-ORM/framework implementation in PHP out there.

I hope to have a working proof-of-concept prototype up by the end of the year with at least MySQL support for basic stuff.

Please, if you have any questions, comments, suggestions, corrections, personal crises, or you just wanna call me out on something, please feel free to leave a comment, contact me, or reach out to me on Twitter.


Appendix A: Because Node is often used with noSQL databases, it is more common to have a schema defintion approach, and a few ORMs I've used in Node (Waterline, Sequelize) have support for SQL database vendors and thus provide automatic database syncing based on schema.

Appendix B: .NET supports a couple different ways to do automatic migrations. It's pretty powerful and in the .NET world, it's hard to say something like this is "only for small, simple projects."