There is no clean (database) development without Version Control

I was surprised when I did a bit of research for this current blog article and learned that SCCS, the first system for Version Control, was invented in 1972. That was 45 years ago. I wasn’t even born.

But even today, after such a long time, there are projects – especially database projects – which are not under version control. I myself remember several projects I worked on (or even was responsible for, shame on me) which were not under Version Control at all or at least the database part wasn’t. So why is this and what can we do against it?

Why Version Control (at all)?

A Version Control System (VCS, also known as SCS (Source Control System) or SCM (Source Control Management)) is a tool to track changes to textfiles and store a history of the changes. It allows to check out any previous version of the controlled files. Modern VCS also provide automatic solutions to merge multiple changes from different sources into the same file. Today, the most accepted and used VCS are git, mercurial and SVN.

Git-Tower, a learning platform for git, lists the following reasons why to use Version Control (follow the link to read the complete reasoning):

Collaboration
Understanding What Happened
Backup
Storing Versions (Properly)
Restoring Previous Versions

In my opinion, Version Control gives a developer especially one thing: safety. If your code is version controlled, you don’t risk to lose things you did in the past, you don’t need to fear breaking something terribly (just go back), you don’t have to be afraid to forget why a change was made (you can always compare with how it was before) and you can learn from bugs and errors because you can find out where and when exactly the wrong code went into the project. You are able to code in a much more comfortable zone than without.

Of course Version Control provides much more than that (think about automated documentation and release notes for example), but I think this is one of the strongest arguments to use Version Control – always and everywhere. You want to make your coding zone as safe and comfortable as possible, because uncertainty and fear are among the strongest killers of focus and creativity (which in my opinion are the most important attributes for a productive software developer).

Databases are different animals

So why are still many database-related projects not or only partially under Version Control (and yes, this is a very subjective observation)? Why are especially database people hestitant to use VCS in their daily work?

It’s because databases – compared to applications development-wise – are different animals. Let me list the most impactful differences:

Applications	Databases
Every build creates a compeletely new, deployable application.	Normally, databases are not re-created from scratch during a build. Changes are applied to an existing database.
Rollbacks are very easy by simply checking out a previous version and building it.	Rollbacks in databases are complex, costly and time-consuming. Sometimes they are nearly impossible without data loss.
Filenames (classes etc.) stay the same and don’t change often.	Update-files have different names. Every update can have its own set of scripts which are only valid for this specific update.
Usually, the initial installation is exactly the same process as installing an update.	The initial installation is completely different from installing an update. Sometimes you are not even able or allowed to influence the installation of the database. The deployment process differs significantly.
You normally work on text-files from the first moment and throughout the whole development process.	Things like procedures, packages, views are not stored as text-files by default. They exist inside the database. If you don’t do additional work, you might not encounter text-files at all.

Facing this additional challenges might be one reason for database developers to stay away from introducing Version Control into their database projects. We even have to completely remove two of the main reasons listed earlier:

Storing Versions (Properly)
Restoring Previous Versions

Those tasks are very easy with a version controlled application, but they might not even be possible or at least require additional effort with database projects.

Now is this a reason to avoid Version Control in database projects? Is this a reason to avoid agile methods in general because of the additional challenges in database development?

I don’t think so. I’m absolutely convinced that – even though it might be more difficult – having a safe development zone and including agile methods like refactoring and self-testing into daily work will be as beneficial for database development as for application development. Version Control is a precondition to achieve this.

How to start?

One of the biggest up-front challenges to start version controlling your database sources is the question how to get them into text files. We database developers are not that comfortable with text files holding sources – we have our database, our query tool and our favourite IDE.

And yes, there are some solutions available which enable you to version control your sources without changing the way you work (looking at some of the redgate, apex or toad products). The downside is that these tools are pretty expensive. They are great, no question, but convincing your managers to buy some four-digit amount software to start doing what application developers do for free since decades might be a challenge in itself. Especially if it’s software with a per-year license, where you tie yourself and your development to a specific vendor.

Another thing is that those tools won’t do everything for you. In my experience, working agile and clean is first of all a change in mindset and behaviour (I’m pretty sure I will write more about that in future blog posts).

So what can we do instead? You are a developer! Dont’ forget that. You got the power to make machines do what you want, so use that power to automate the process of getting the sources stored in your database and write them to a file structure which you can version control. In most databases it’s pretty easy to get the sources, some links here to get you started (you can also easily see that other people already had the same problem – and solved it):

StackOverflow
Oracle-ddl2svn
Visual Studio SQL Server Data Tools
Most commercial database IDEs have a function to (at least initially) write all your ddl to a file structure

Database to file or file to database?

After having a tool to get the sources from database to file structure, it might be tempting to keep working directly on the database, maybe with some nice IDE and drag and drop and all that fancy stuff.

I don’t recommend this approach. Instead I encourage you to start working on the files and run them to bring the changes into the database. Have your file with – for example – an Oracle Stored Procedure Body in your favourite text-editor (best-case with intellisense), edit it there and then run it against your development database (this might be a bit more tricky for SQL Server procedures because you don’t have “CREATE OR REPLACE” there, but SQL Server 2016 got “CREATE OR ALTER” and for older versions there are also workarounds).

It will be much easier to automate migrations and once you dive deeper into agile database development you will more and more get in touch with files. Getting into a close relation with your favourite text-editor will become as beneficial as ironing your sql skills.

What to version control?

Besides the obvious version control abilities, you can also use VCS as backup and central storage. Keep all the things you need closely together.

DDL

First of all I talk about Schema/DDL when talking about version controlling your sources. That includes the SQL for Packages, Procedures, Functions, Triggers, Objects but also Tables, Views, Materialized Views, Sequences etc.

This is the most important part of your sources and also the part which is easiest to version control. Get your DDL-statements to files and start working from there.

Data

Version controlling data is a topic on its own, but to keep things short: there might be some data which is required for your project to work. Data like configuration settings, basic definitions and other general data.

You should script this data and version control it. If you have customer-related data in your project, you should consider version controlling that, too, maybe in different customer-related repositories.

Install-files

It is absolutely great if you are able to easily install your database project on a local machine. If you have the opportunity to script your installation steps, do so and put the script-files under version control.

Snippets

Do you know Snippets? Most IDEs come with a possibility to store templates of code and insert them via shortcut. Most of them also allow to store those snippets as files. A great possibility to start growing your personal library of productivity-increasing templates and snippets – and version control them.

If you have Snippets for multiple projects, it might also be a good idea to store them in a separate repository to have them independent from your specific project.

Preferences

It might be useful to have your IDE’s or text-editor’s preferences also version controlled. This is valid for both, project-specific preferences and general preferences – though the general ones should probably be stored in a separate repository.

There might be one exception for project-specific preferences, though: If you are working on a project with a bigger team where different IDEs and tools are used (for example open source projects), project-specific preferences should not pollute the repository.

For smaller teams and projects where you agreed upon one development environment and toolset, storing preferences closely together with your source files is a great help.

What NOT to version control?

In my opinion, this is a very situational question. I will try to give some thoughts on that, but you should never create a dogma about it. Keep in mind: Everything which helps making your project easier to maintain, easier to understand and easier to evolve is beneficial – no matter what some guides say about specific topics.

Binary files

In general I would say that all sort of binary files (think of database-dumps, zip-archives, images, executables etc.) are nothing you want to have polluting your repository – VCS are not able to track changes to binary files and will always store a completely new copy of the file, which can lead to wasted resources.

But there might be exceptions in some situations. Think about a project which needs a specific version of a compiled tool for automation at some point – it might be useful to have this tool included in your VCS so it will definitely be available when you need to work on the project.

Or you got a legacy (database) project where you are not able to create a new instance of your project via SQL install files but you have a dump as starting point of your implementation. In that case it can be totally valid to store this dump with the rest of your install-files.

As a rule of thumb I’d say that it’s okay to have binary files in your repository if they are a necessary part of your project and don’t change often.

Auto-Generated code

There might be some parts of your code which are being generated automatically from other sources and definition files.

In general, you shouldn’t version control that auto-generated code but you should of course version control the code which auto-generates it and the source definitions – together with an automated possibility (for example python script) to generate it.

This is valid for auto-generated code which is not adjusted after generation manually. If you change your previously auto-generated code it’s not auto-generated anymore: You should definitely have it under version control then – or think about how you can improve your code generator to achieve a situation where the result does not have to be changed.

In any case: if you use code-generators, make sure it is clearly understandable and easy to setup a development environment and to get the things and code you need to make your project work. The best code-generators are useless if you someone with less understanding of your project can’t use it (and yes, this might not be valid for one-person-projects, but how much do we development professionals really want to work in one-person-projects? You should work hard on keeping the walls to join your project small).

When to start?

That’s easy: right now.

Honestly, I don’t think there is any reason not to get started immediately with putting your database project under Version Control.

The effort to do so is fairly low – I can’t imagine a situation where getting started will cost your more than one or maximum two working days (if you are not familiar with VCS yet).

Don’t get me wrong: to find the best way to organize your files, working in a way which empowers Version Control and getting as much as possible version controlled is a long and challenging process which will probably never end, but to get started with your DDL or at least the functional parts of your DDL (Packages, Procedures, Functions, Triggers and Views) is a pretty simple task.

You might have to become familiar with the concepts of Version Control, you might have to open your mind and you might have to question the way you did things before, but remember that you’re a developer and to continuously improve your knowledge and skills should be as much part of your mindset as being curious and excited to try out new ways.

This is especially true if your goal is to improve not only the featureset of your project but also the quality of your code and the reliability, maintainability and evolvability of your project. Because there is no clean database development without Version Control. What are you waiting for?

2 Comments

Strongholds of confidence: self-testing your database | Clean Database Development · October 26, 2017 at 9:59 pm

[…] instead-of triggers are lost and you have to re-create them (hopefully you got your database version controlled, so you are able to easily solve such an issue if it happens). In fact this happened to me at least […]

Strongholds of Confidence: Self-Testing Your Database – Cloud Data Architect · April 5, 2018 at 11:59 am

[…] instead-of triggers are lost and you have to recreate them (hopefully you got your database version controlled, so you are able to easily solve this issue if it happens). This happened to me at least once in a […]