One of the strongest recommendations for clean and pragmatic development is the DRY-principle:
Don’t Repeat Yourself!
This principle is not only part of every guide about professional development but also the reason why we even have things like subroutines, config files and – hello, database professional – data normalization.
Although normalization should be a well-known practice for database professionals, I caught myself writing the same SQL statements again and again, copy-pasting parts as sub-selects, use it in procedures and functions and ending up at having the same query logic in many different places.
We database people are very strict about redundant data in our database design – why aren’t we as strict about the code we write?
Start being as strict about redundant code as you are about redundant data
Duplicate code is a source of pure evil in software development. It highly increases the effort to change logic and leads to strange behaviour. And queries are code.
I know that in most situations developers nowadays don’t have to deal with SQL that much – there are pretty good ORM and other database abstraction frameworks for object-oriented or functional programming languages, but there are still the situations and projects where significant parts of the logic of a system depend on the queries or even database programming constructs like triggers and procedures.
There is no reason why developers dealing with that kind of challenges shouldn’t follow the same principles of clean and simple code as their colleagues in application development.
So, how can we achieve DRY in database development?
Besides the obvious places in procedures and functions where you can easily follow the DRY principle (okay, not as easy as in object-oriented languages) it is also pretty simple to bring it to SQL queries: Views.
Every professional RDBMS has the concept of Views, storing a piece of SQL to reference it in other queries. That’s exactly what we need: storing a piece of code which can be referenced by other code instead of duplicating the logic, creating a portal to the evil domains of bugginess and update horror.
But THE PERFORMANCE!
Database people are often nearly hysteric about performance. And yes, that’s okay, performance in databases is important. But is it worth to gain the last bit of 0,01% performance increase by sacrificing readability, maintainability and simplicity of the statements (code) we write?
This is a pretty old argument in development community but in my opinion several years (or even decades now?) of experience in agile development concepts have profen that evolvability and flexibility of code is – in most situations – way more important than the latest bit of performance optimization.
And yes, when talking about views we are talking about the latest bit of performance.
At least for oracle databases, the performance argument is not even existant:
For oracle, a view is only a stored piece of SQL code. When selecting a view, the optimizer will load the SQL code and optimize the whole query in exactly the same way as if that query would’ve been provided without view reference.
Im am not 100% sure about the other major RDBMS, but I am absolutely convinced that the little bit of overhead a view might cause will almost never outweight the benefits of clean, simple, maintainable code!
The perfect DRY-View
So what could the perfect view to simplify our query and remove duplicate code look like?
I came to the following suggestions for such a DRY-View:
Encapsulate one single part of logic
Well, that’s not exactly DRY, but follows other principles of clean code. Although I think it is very beneficial to keep the DRY-Views very small and simple and better introduce several DRY-Views referencing each other to build up more complex logic.
Give the view a good name
The name of the view should tell the developer what it’s going to provide. According to some questionnaires this task will be pretty challenging!
Give the view a DRY-suffix
Not only will this tell the developer what the view is meant to be (a reusable piece of query logic) but also it will remind you about the DRY principle every time you read it. This basic but important principle will more and more become part of your mindset.
Comment your DRY-View
Tell others what this specific view is meant to do. You can do so with code comments (like — or /* */). I myself started to comment all my (DRY-)views in Oracle with the built-in
COMMENT ON TABLE DRY_MY_VIEW IS 'Comment';
In SQL-Server you could do the same with extended properties, namespace ‘MS_Description’.
The reason behind this is also the DRY-principle: By storing the description of my view in a way which can be read from an external tool I can generate my documentation (or parts of my documentation) directly from it – without doing it twice. Don’t repeat yourself! Not even when documenting!
Example of selecting the comments of a view or table (Oracle):
SELECT uo.OBJECT_NAME, c.COMMENTS FROM USER_OBJECTS uo LEFT OUTER JOIN USER_TAB_COMMENTS c ON uo.OBJECT_NAME = c.TABLE_NAME WHERE uo.OBJECT_TYPE IN ('TABLE', 'VIEW')
Write Unit-Tests for your views
Yes, I know unit-testing is a topic not widely known or used in database development.
Nontheless it’s so important and it will give you as developer so much freedom and confidence that it’s absolutely worth taking a look at self-testing solutions for your preferred RDBMS.
I will most likely write one (or many) blog posts about this topic in future, but in the meanwhile check out those links:
- Oracle: utPLSQL v3
- SQL-Server: tSQLt
Similar queries to different sources (tables)
While writing about all that DRY stuff, there is one thing which cannot be adressed with views: similar queries to different tables.
While we have all kinds of tools in object-oriented development languages to deal with such situations, we don’t have them in SQL.
I didn’t get a very good idea on this topic yet – for Oracle it could include dynamic SQL, for SQL-Server it could include temporary tables.
If you have ideas how to get more DRY into these cases, contact me – I will be glad to share thoughs and opinions!
1 Comment
Ben Einhorn · May 20, 2018 at 3:06 pm
Views are not evil. But they may not behave exactly as one might think they do. Back in the day, predicate pushing was pretty badly optimized (in some RDBMS), so you actually could end up with a view that was way slower than a select, just because predicates could not be pushed into the view. You still can, though I would argue that those cases are rather rare.
But you totally can really mess around with it in MSSQL Server 2017 even and have really “interesting” results. This does not mean that views should not be used, but that people still need to check the results of their statements – which should be a given anyway.
On the topic of similar queries to different tables, I’d advocate that if the query is rather complex and includes logic – so you want to reuse it, views may be used analogous to interfaces in OOP to normalize data before using the query, however a CTE might just do the job. That should not eat up too much performance in most instances, I’d assume.