Introduction to Entity Framework 4.3 Migrations - Part I

The more I work with Entity Framework (EF), the more I love it. Each new release brings with it so many amazing new features that it's hard to keep up with them all. The team behind Entity Framework is working hard to provide a great tool . EF truly is the bleeding edge Object Relational Mapper (ORM) and if you're not using it then you need to catch up!

Today I want to talk about EF 4.3 which introduced code-based data migrations. Code-based data migrations are easily the coolest thing to happen to Entity Framework since code-first. Until 4.3 I have had a bad attitude about EF code-first because before 4.3 if you wanted to go with a code-first solution you would have to figure out how to migrate your database schema forward while preserving data stored within it.


With the slightly older database-first approach you would build your database and then map that to an EF designer file (EDMX). This designer file would contain XML mappings that would allow you to have a full designer where you can visualize your model and the relationships. Any time you needed to update your model you would start at the database level. Once your change was made in the database you would then update the designer file through the "Update Model From Database" wizard.

This made it simple to migrate your database schema forward because you could preserve data by making the changes to the database manually. Database-first was the first version of Entity Framework and while amazing already, some people wanted more.


Many people, myself included, were not huge fans of the EDMX file containing mountains of XML mapping data. One major reason is that it is really hard to maintain in a team environment. If two people make changes in the Entity Framework designer and check in those changes to source control then you end up with a conflict and one unlucky person will have to navigate that nest of XML in a merge tool. Even though my team is used to doing that now it is still extremely unpleasant when it happens. Our policy has been to check changes in immediately after modifying the designer so that it stays in sync as much as possible.

The solution to this beastly designer file is an approach introduced in Entity Framework 4.1 called "code-first". This approach keeps your model defined entirely in code. You simply set up a set of plain C# objects (aka POCOs) and one special object known as a DbContext and contains properties that represent collections of your model objects. For a more in-depth explanation of code-first I highly recommend reading this tutorial.

Code-first would do almost everything for you with as little or as much configuration or customization as you want. The only issue was that everytime you ran it, it would drop and re-create your database. Almost every article I read would just gloss over this as a minor detail but for me this was HUGE. To drop and re-create the database means to drop all data already in the database. So if I made changes to my model that would need to be schema changes in the database then I'd have to think real hard about how I was going to apply those changes to a live database already filled with data. I'm no DBA but sometimes I felt like the only real choice was to manually compare the new schema with the old and then manually apply the changes in the database. That was about as fun as jamming your fingers in the car door; in fact sometimes I'd have preferred to jam my fingers in the door, at least that would have been over quickly.

In the end I decided that code-first was not production ready. Maintaining the designer file was still easier.


Well now EF 4.3 is out and I have been jumping for joy. It's the first release of code-first migrations and while it's not as intuitive as I would have hoped I am confident they will continue to refine it. When you make a change to your model you instruct the migrations utility to make it's best guess at how your model changed. Usually it's accurate, but occasionally I have to refine how the model should be changing. It will generate a migration code file for you. At first it seemed so odd to me, but once I realized the implications of having these migration code files I couldn't be happier. Each time you migrate your schema forward a new file is generated. After you have your migration file you then run the migration to update the database. The cool thing about this though is that these code files represent a timeline in your database and they can be checked into source control. So forgetting for a moment about the real problem it solves (preserving data), being able to revert your database to any point in time via code in source control is extremely useful. Database versioning has always bee a struggle for me.

In addition to this the migrations ability will now allow you to preserve your data by customizing just how you want your schema to change. Obviously if you're dropping a column then you'll lose some data, but you are in total control of how the migration occurs. You can even create code in the migrations configuration file that will seed your database with test data so you don't have to go populating the database with fake data every time. You can also use the migration files to generate scripts that you can then take up to your live database. My source code is all in Git so I prefer to have a branch called "Live" where I store all the sensitive data such as production server connection strings, passwords, etc. So all I do is flip to my private "Live" branch and run the migration. Since the code in that branch actually points to my live server it will go ahead and upgrade the live schema. I just have to be careful to leave the seed data code commented out in the "Live" branch because it does not belong in the production database.

The only thing I'm still somewhat confused about is why the seed data method is run on every single migration. You'd think it would only be run on initial database creation. I'll need to play around with this feature more but at first glance it seems really awkward to call AddOrUpdate() methods which will add data or completely overwrite the data (update) so that the seed data is always the same. If you change a row of seed data and then run your migration utility you'll see that the changes you made were overwritten by the seed data again. I'm sure there is a good reason why they do this, but I've yet to figure it out.

Part II