When it comes to testing databases, the probably most important thing is to check for the validity of the data. Often, data is our most precious asset and also the longest living one.
In this example, we have the problem of several competing Darth Vader users, but gladly we can use SQL and the built-in features of relational databases to identify, correct, and prevent this kind of data problem.
Step 1: Identify
First of all, we need to identify the rows that are problematic, and we can use SQL to do that:
select *
from deathstar_users
where lower(name) in (
select lower(name)
from deathstar_users
group by lower(name)
having count(*) > 1
);
By using the lower
function Oracle provides (and there are equivalents in all the other well-known relational databases), we can even catch different cases of dArTh VAder.
At this point, we could also add a couple more functions like trim
to ignore all possibly added whitespaces is front of after the name.
This query will return the following result:
1 | Darth Vader | „dark“ | 10‘000.00 | Sith Lord |
2 | Darth Vader | 10000.00 | Boss | |
3 | darth VAder | „Dark“ | 15000.00 | Sith Lord |
Step 2: Correct
The problem we are usually facing now is to decide which of the competing entries are correct.
We really need the help of people who know the business context here, who can tell whether Darth Vader earns 10 or 15 thousand credits and whether his role is Boss or Sith Lord.
Check out Example 3.1 to learn how to easily get this information into Excel and get the corrected results back into the database – no matter which database you use.
We will end up with a couple of updates and deletes that clean up our problematic data.
Step 3: Prevent
To actively prevent our database to contain competing entries again, we can use a unique constraint or unique index. Both will throw an error when anyone tries to add a row with a value for a column protected by a unique constraint that already exists.
There are again some nuances depending on the database you are using. In Oracle, we can add unique constraints only on pure columns. Unique indexes, however, can be added to columns in combination with functions like lower
or trim
(this is called a function-based index).
Since we want to protect the name column in a case-insensitive way, we choose the unique index with lower
create unique index deathstar_users_uq_name
on deathstar_users (lower(name));
or even with lower
and trim
create unique index deathstar_users_uq_name
on deathstar_users(lower(trim(name)));
Now, we cannot add any combination of Darth Vader to the data table. All the following insert statements will fail:
insert into deathstar_users ( id, name )
values (100, 'Darth Vader');
insert into deathstar_users ( id, name )
values (101, 'darTh vader');
insert into deathstar_users ( id, name )
values (102, 'Darth Vader ');
You can find the whole example as SQL-script here.
Previous: Example 2: Automated tests of database functionality with pure PL/SQL
Next: Example 3.1: Getting information from database to Excel and back to the database