Continuous database integration

Continuous database integration

by Peter Hancock 8. September 2007 10:00

Introduction

The following article came about after reading reading Evolutionary Database Design by Martin Fowler and Pramod Sadalage.  The concepts were great, but I couldn't find anything on the web that really showed how to go about implementing it.  So I decided to have a go at it myself from the ground up.  Following is how I've managed to do it for the current project I'm working on.

The Environment

Developers box

  • SQL Server Instances
    • DEV
    • DEPLOY
  • osql.exe (or isql.exe)
  • Nant 0.85
  • Some form of source control (I'm using Perforce)

Build server

  • SQL Server Instances
    • DEV
    • DEPLOY (I can't afford a separate server for staging at the moment)
  • osql.exe (or isql.exe)
  • Nant 0.85
  • CruiseControl.NET
  • Some form of source control (I'm using Perforce)

Production server (ProdServer)

  • SQL Server Instances
    • LIVE

Folder/Project structure

  • CreateScripts
    • Databases
    • Tables
    • Views
    • Constraints
    • Triggers
    • StoredProcedures
    • Functions
  • MigrationScripts

Requirements

Our current project requires regular releases, and these often involve database changes.  We are using Perforce as our source control system.  CruiseControl.NET runs on our build server and is configured to fire of a NAnt build script on every code checkin.  The build requirements at checkin are...

  1. Create an empty database for unit testing.  Run all unit tests on it.
  2. Test the deployment of a new production or upgrade of an existing production database.

Implementation

The setup

As indicated in the folders section of the environment, it's important that the folders for the project are set up in the above way.  There are rules however that need to be enforced within this structure.

The CreateScripts\Databases and CreateScripts\Tables folders should be read only for developers.  This is because creating a database, or creating a table requires that these database objects don't exist in the first place.  If we have an existing database, and we CREATE DATABASE on it, or an existing table and we CREATE TABLE on it, the script will fail (or worse if we include IF EXISTS... DROP TABLE).  So instead, these scripts should be located in the migration scripts.  Any changes to existing tables (ALTER TABLE) or database (ALTER DATABASE) also belong in the MigrationScripts folders.

The remaining folders can be used quite happily to store the objects.  Destroying the objects and recreating them will not affect the underlying data.

One last area to note is that the CreateScripts\Constraints folder should contain only referential integrity constraints.  These constraints should NOT be included in the CREATE TABLE script that is placed in the MigrationScripts folder. 

The driver script

We start off first by building the overall application.build script use by CruiseControl.NET to kick off the build. 

<?xml version="1.0" encoding="UTF-8" ?><project name="Application" basedir="." default="checkinBuild">
    <!-- Target runs on every checkin -->
    <target name="checkinBuild">
        <!-- build the unit test database first -->
        <property name="database.server" value="(local)"/>
        <property name="database.instance" value="DEV"/>
        <nant buildfile="Database.build" inheritall="true" target="build" failonerror="true"/>
        <!-- And now, run the deployment -->
        <property name="database.server" value="(local)"/>
        <property name="database.instance" value="DEPLOY"/>
        <nant buildfile="Database.build" inheritall="true" target="restoreDatabase"/>
        <nant buildfile="Database.build" inheritall="true" target="deploy" failonerror="true"/>
    </target>
</project>

The default target is checkinBuild, indicating this NAnt target will be run on every code checkin.  First, we call the database.build file build target.  The build target is determined by setting the database.server and database.instance properties prior to calling the target with inheritall=true.  The second target - deploy - works the same way, except it executes either a new build, or upgrades an existing build.  Note also that we restore the database first.  This is because we don't want to run the scripts on the current database as it stands (it may have already had migration scripts already run).  Instead, we want to restore the latest available production database backup.  This way, we are testing on a real system.

The new database script

The new database script is created in the database.build NAnt script.  The sole purpose of this target is to completely destroy the existing database, and overwrite it with a brand new one, created from scratch.  The result meets requirement 1.  A blank database ready for unit testing.

<target name="build" description="Create a brand new database">
    <call target="createDatabase"/>
    <call target="createTables"/>
    <call target="createViews"/>
    <call target="migrate"/>
    <call target="createConstraints"/>
    <call target="createProcedures"/>
    <call target="createFunctions"/>
</target>

Key points to note about the above script are that we call the migrate target (described later) after the createDatabase and createTables targets.  The reason we do this, is because after creating the database and tables, the migration scripts may include alterations to tables, new tables, or removal of colums from existing tables.  These all need to be included to complete the database structure.

Delving deeper into each of the createXxxx tasks, they look like this.

<target name="createTables">
    <foreach item="File" in "CreateScripts\Tables" property="script" failonerror="false">
        <call target="osql" failonerror="true"/>
    </foreach>
</target>
<target name="createProcedures">
    <foreach item="File" in "CreateScripts\Procedures" property="script" failonerror="false">
        <call target="osql" failonerror="true"/>
    </foreach>
</target>

Each of these loops through all the scripts in the appropriate folders, and runs them using osql on the database.  (There is an osql task explained at the bottom for this)

* Astute readers may have noted that we are running scripts on the Databases and Tables folder, yet I mentioned that these should be read only for developers.  In short, these folders will be automatically populated at the end of an iteration.  I will explain further below.

The result, after running the build target, and calling all the create scripts is that we have a blank newly created database, from scripts entered by developers.

The deploy database script

The second requirement was that we be able to test the deployment of a database to an existing server.  There are two requirements here.  First, if the database doesn't already exist, then we need to create a brand new one.  If it DOES exist however, we just need to upgrade.  We use osql to connect to the database by the database.name property.  We assume that if osql fails, the database does not exist, therefore we need to create it from fresh.  If it succeeds however, we can safely assume there is already a database by that name there, and we need to upgrade it.  The selection script looks like this...

<target name="deploy">    <!-- Attempt to connect to the named database.  If the database exists, osql -b will return 0 -->    
<exec program="osql" commandline="-b -E -s ${database.connection} -d ${database.name} -Q "GO"  resultproperty="osql.result" failonerror="false"/>
    <if test="${osql.result == '0'}">
        <call target="upgrade"/>
    </if>
    <if test="${osql.result == '1'}">
        <call target="build"/>
    </if>
</target>

The upgrade script, is similar to the new script, but excludes creating the database and creating the table scripts.  (Otherwise we may potentially drop and recreate them, losing data)

<target name="upgrade" description="Upgrades an existing eventlive database">
    <call target="backupDatabase" failonerror="true"/>
       <property name="nant.onfailure" value="restoreDatabase"/>
       <call target="migrate" failonerror="true"/>
    <call target="createConstraints" failonerror="true"/>
    <call target="createProcedures" failonerror="true"/>
    <call target="createFunctions" failonerror="true"/>
</target>

The curly bits

This is where it gets a bit curly.  When migrating data or upgrading database as a series of tasks, the order that the tasks are executed is vitally important.  For example, the first task may add - say - a category table and a column to the Suppliers table.  It also adds the referential integrity constraint between suppliers and categories.  Order is important here.  First - we can't create the constraint until both table and column are added.  This is why the migration is run before the constraints.  If a later data migration task then wishes to add new category to the table, we need to ensure that the create table script has been run first.  In this case, the order within the MigrationScripts folder is important.

There are a number of ways to do this.

  1. Manually
  2. DateCreated
  3. DataModified

I use date created by performing a dir /b /od /tc *.sql > order.txt, and then iterating through the resulting text file and running the scripts in that order.  Alternatively, you could remove the dir that I execute and generate order.txt anyway you like.  The migrate target, whilst similar to the createXxx targets, doesn't iterate the directory though - rather, the order.txt file.

<target name="migrate" description="Run the migration scripts on the database">
    <exec program="${solution.dir}\tools\listFiles.bat" commandline="*.sql order.txt" workingdir="Migration Scripts" failonerror="false"/>
            <foreach item="Line" in="Migration Scripts\order.txt" property="script">
        <property name="script" value="Migration Scripts\${script}"/>
        <call target="osql" failonerror="true"/>
    </foreach>
</target>

Summary so far...

The above scripts are called for every checkin within the iteration.  Every code change that affects the data model, every data migration, data insertion or cleanse always gets put into the MigrationScripts directory.  Each time it is checked in, the NAnt build rebuilds the database from scratch.  It also takes the latest backup of the production database, and restores it to the DEPLOY instance - before finally running the deploy target on this database to test the deployment.

End of Iteration

So far, everything has been going along nicely, checkins have been rebuilding that database, and testing the migration scripts executing on the staging database.  Everything works as it should, and everybody is happy.  All we need to do now is finish off the iteration.  First, we deploy to the production server.  We know this works because we tested it by restoring from production and test running the scripts on the last build script.  Deployment is simple...

Nant.exe -buildfile:ApplicationDir\Database.build deploy -D:database.server=ProdServer -D:database.instance=LIVE

The last thing we need to do is cleanup after the iteration.  This involves archiving the migration scripts - we don't want to be rerunning them again.  The problem is though - if we archive the migration scripts and they contain create table scripts, we need to be able to run them again to create the new database for unit testing in the next iteration.  So what we do instead, is use upgraded database (on DEPLOY) to automatically generate the create database and create table scripts for us.  These scripts are then placed in the CreateScripts\Databases and CreateScripts\Tables folders - by the build server not by a developer.  Once these scripts are generated, we can move the MigrationScript items into an archive folder.  The nant targets look like this.

<!-- Script database takes an existing created database and scripts the tables and database    -->
<!-- creation routine for later checking back in to source control          -->
<!-- Note that this should ONLY be called when delivering on an iteration -->
<target name="scriptDatabase">    <exec program="cscript.exe" verbose="true">
       <arg line="${solution.dir}/Tools/scriptDatabase.wsf //Nologo //B"/>
       <arg value="//job:database"/>
       <arg value="${database.connection}"/>
       <arg value="${database.name}"/>
       <arg value="Create Scripts/Databases"/>
   </exec>
   <exec program="cscript.exe" verbose="true">
       <arg line="${solution.dir}/Tools/scriptDatabase.wsf //Nologo //B"/>
       <arg value="//job:tables"/>
       <arg value="${database.connection}"/>
       <arg value="${database.name}"/>
       <arg value="Create Scripts/Tables"/>
   </exec>
 </target>
 <target name="archiveScripts" description="Archive the database scripts" depends="scriptDatabase">
    <!-- Now we need to archive the migration scripts -->
    <tstamp property="archive.dir" pattern="yyyyMMdd"/>
    <move todir="Archive\${archive.dir}">
        <fileset basedir="Migration Scripts">
            <include name="*"/>
        </fileset>
    </move>
</target> 

Note - we are calling out to a script host program that generates the table and database scripts via SQLDMO - briefly...

   var objArgs, serverName, databaseName
   objArgs = WScript.Arguments
   if(objArgs.Length != 3)
   {
	WScript.Arguments.ShowUsage();
	WScript.Quit(1);
   }
   serverName = objArgs(0);
   databaseName = objArgs(1);
   outputPath = objArgs(2);
   var fso = new ActiveXObject("Scripting.FileSystemObject");
   if(!fso.FolderExists(outputPath))
   {
	WScript.Echo("Folder doesn't exist");
	WScript.Quit(1);
   }      
   var sqlServer, sqlDB, tableCollection, table, e;     
   server = new ActiveXObject("SQLDMO.SQLServer");
   server.LoginSecure = true;
   server.Connect(serverName, null, null);
   database = new ActiveXObject("SQLDMO.Database");
   database = server.Databases(databaseName);
      /* DMO Constants */
      var SQLDMOScript_Drops = 1;
      var SQLDMOScript_IncludeIfNotExists = 4096;
      var SQLDMOScript_IncludeHeaders = 131072;
      var SQLDMOScript_Default = 4;
      var SQLDMOScript2_NoCollation = 8388608;
      var SQLDMOScript_DRI_PrimaryKey = 268435456;
      var SQLDMOScript_DRI_Checks = 16777216;
      var SQLDMOScript_DRI_Defaults = 33554432;
      var SQLDMOScript_DRI_UniqueKeys = 67108864;
      var SQLDMOScript_ToFileOnly = 64; 
      tableCollection = new Enumerator(database.Tables);
      for( ; !tableCollection.atEnd() ; tableCollection.moveNext())
     {
	table = tableCollection.item();    
	if(!table.SystemObject)    
	{     
		var params = 	SQLDMOScript_Drops | SQLDMOScript_IncludeIfNotExists |  
                                                    	SQLDMOScript_IncludeHeaders |  QLDMOScript_Default |  SQLDMOScript_DRI_PrimaryKey |  
				SQLDMOScript_DRI_Checks | SQLDMOScript_DRI_Defaults | 
				SQLDMOScript_DRI_UniqueKeys | SQLDMOScript_ToFileOnly;
		table.Script(params, fso.BuildPath(outputPath, table.Name + ".sql"), null, SQLDMOScript2_NoCollation);          
		WScript.Echo("Processing " + table.Name);
	}
   }
   server.DisConnect();

We have a complete set of CreateScripts in their respective folders, the archive target is called which creates a new directory with the date set.  The MigrationScript files are then moved to this directory.  (And later checked in to source control)  The iteration is closed.

Conclusion

It's not that difficult to get it running.  The advantages of spending the effort are tremendous though.  In previous companies I've worked at, the running of scripts has been done by the DBA, and based on a weeks work getting everyone together to rollback failed ones, update the ones that depended on the failed ones, and generally just make sure it works.  It's prone to transcription errors, permission errors, script errors and inconsistencies.  By testing the deployment each time, as well as unit testing, we can reduce this.  One last thing worthy of note though is that just because it's easier to change, doesn't mean that developers can hack tables.  Thought still needs to be put into the design of the database.

Notes:

  • I've ignored the backupDatabase and restoreDatabase targets - these are left as the dreaded exercise for the reader.
  • I've used .wsf files for scripting the tables and database creation because it's fast enough and it works for me.  If speed is an issue, vb, C#, C++, java or whatever you want can be used instead.
  • I'm assuming that the production database backups are able to be transferred to the build server.  In some environments, production data is sensitive or bound by privacy laws.  In these cases, a test database could be used instead.  It's important though that it be of the same structure as the end of the previous iteration.
  • Any database refactoring scripts or data cleanse / migration scripts that are stored in the MigrationScripts folder should have SQL verifying the values at the start and finish of the script to ensure that it migrated correctly.  Failures should throw errors forcing a build failure.  RAISEERROR could be appropriate.

Currently rated 5.0 by 3 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Software development

Related posts

Comments

September 8. 2007 18:36

MattTheFatt

Excellent work Peter! I found I hand to read this carefully several times before I grokked all the nuances.. Thanks a ton!

MattTheFatt us

September 8. 2007 21:25

MattTheFatt

FYI.. the example of the CreateXXX Tasks has the "Create Procedures" task calling the scripts in the tables folder.. "CreateScripts\Tables"

MattTheFatt us

September 8. 2007 21:50

Peter Hancock

Thanks Matt - usual sort of cut paste error. I've made the edit.

Cheers

Peter Hancock au

September 9. 2007 20:27

MattTheFatt

Couple of other bits and pieces, Peter, as I work through this ;-)
* I didn't see your "osql" target, so I created one:

&lttarget name="osql">
&nbsp &lt!-- Try and execute the script -->
&nbsp &ltexec program="sqlcmd" commandline='-b -E -S ${database.connection} -d ${database.name} -i "${script}"'
&nbsp resultproperty="sqlcmd.result" failonerror="false"/>
&lt/target>

* The "-s" in the osql exec should probably be capitolized "-S"
* I think the "database.server" and "database.instance" in the app.build script are meant to match up to the "database.connection" and "database.name" in the db.build?
* In the "migrate" task you reference ${solution.dir} which isn't defined.. I used the ccnet workingdirectory value.. ie. ${CCNetWorkingDirectory} instead
* "listFiles.bat" just thought I'd put the actual line of code (don't forget the quotes if there are spaces in your path":
dir /b /od /tc "%1" > "%2"
* If anyone has any spaces in their file paths, then they're going to have a hard time debugging the NAnt tasks.. so I updated the "exec" stmts to use single quotes for the commandLine, and then use the double quotes for the parms '.... "$script"' instead of "... $script" i.e.
exec program="sqlcmd" commandline='-b -e -S ${database.connection} -U user -P password -i "${script}"

MattTheFatt us

September 9. 2007 22:21

MattTheFatt

ok.. everything is working, but I'm a little confused now. I was going under the impression that the entire directory structure is under source control, but that doesn't make sense. We only want to run the migration scripts once (or other scripts), and if it was under source control it would get run every time. example:
1) migration script: createNewAddressTable.sql
If I check that in, CC will check it out into the migrationscript folder and eventually run it. ok. Then at the end of the iteration the script is removed (?) and the table change is absorbed into the general 'createTables' script.
2) Next iteration, I add another table: createNewUserTable.sql once it's checked in, CC updates the migrationScript folder, so now it will hold this new script along with the old createNewAddressTable.sql even though the table has been incorporated into the general createTables script.

What am I missing? Do developers only submit migrationscripts? do these then get checked in at all or do we push the results into the general scripts?

hm.. bummer.. I thought I had this figured.

MattTheFatt us

September 9. 2007 23:06

Peter Hancock

database.server in our case is the windows server name - say BOTTLEIT_DATA

database.instance is the instance name - say - MyCompanyDB. Useful for the case where you keep separate SQL Server instances, one for each customer... eg.
BOTTLEIT_DATA\CompanyDB
BOTTLEIT_DATA\Customer1_DB
BOTTLEIT_DATA\Customer2_DB
It's important we keep instances separate as if one goes down, we don't want to impact all customers.

database.name is the actual name of the database, ie - Sales or Payroll. Most customers have only one database, but some have two... (ie - one has an OLAP database)

database.connection is just a shortcut property to concatenate the database.server and database.instance.
Essentially it's nant build -D:database.server=(local) -D:database.instance="DEV" -D:database.name=MyDB as opposed to nant build -D:database.connection=(local)\DEV -D:database.name=MyDB

It was done this way because instance is populated with TEST on our test box, and it was just cleaner and more readable to have separate properties for us.

I didn't include a few targets - the dreaded "exercise for the reader". Your osql looks exactly the same. Nice on changing the " to ' too. Spaces in pathnames would be an issue.

Regarding your second comment...
You've hit the nail on the end. After the initial creation, developers only submit Migration scripts. So your CREATE TABLE statements go into the Migration Scripts folder. These get executed, and at the end of the iteration, the CREATE TABLE gets extracted into the appropriate folder. That re-extraction is to allow a clean build with empty data from scratch. (Note - I still haven't figured out a nice way to update meta data!)

Following that, all of your migration scripts get filed away in an archive folder so at the start of the next iteration your source control contains...
1. List of Creation scripts (CREATETABLE, CREATEVIEW, CREATE PROCEDURE folders) which allow you to recreate the database from scratch
2. Empty migration scripts folder.

Then, you submit all table creation/alteration/deletion statements to the migrate scripts folder. Stored Procs, functions, views etc. are submitted to the respective CREATE folders as these don't change the structure of the data and can be recreated at any time.

For what it's worth - once we've completed iteration 1, we completely remove write permissions to CREATE DATABASE and CREATE TABLE folders in source control for our developers. The build machine maintains write permissions as it's the build machine that executes the EndIteration script for us. This ensures that developers have to submit changes through the migration scripts. We could I guess make it easier to understand by just saying that from that point on, everything goes through migration folder, but we found it easier just to update the procedures, functions and views directly in the create folders.

You're pretty close though. The end iteration part is the hardest to grok. From our point of view, this whole process came during a period of transition. We had a number of very short term developers come and go and we needed them to be able to get up and running as rapidly as possible. The goal was to be able to hit source control, and run nant build and get an empty database immediately.

Hope this helps.

Peter Hancock au

January 9. 2008 01:29

pingback

Pingback from rabdullin.com

Common development principles for xLim at Rinat Abdullin

rabdullin.com

Add comment


(Will show your Gravatar icon)  

  Country flag





Live preview

March 12. 2010 07:16

Recent posts

Recent comments