When it comes to testing databases, the probably most important thing is to check for the validity of the data. Often, data is our most precious asset and also the longest living one.

In this example, we have the problem of several competing Darth Vader users, but gladly we can use SQL and the built-in features of relational databases to identify, correct, and prevent this kind of data problem.

Step 1: Identify

First of all, we need to identify the rows that are problematic, and we can use SQL to do that:

select * 
from deathstar_users
where lower(name) in (
  select lower(name)
  from deathstar_users
  group by lower(name)
  having count(*) > 1
);

By using the lower function Oracle provides (and there are equivalents in all the other well-known relational databases), we can even catch different cases of dArTh VAder.

At this point, we could also add a couple more functions like trim to ignore all possibly added whitespaces is front of after the name.

This query will return the following result:

1Darth Vader„dark“10‘000.00Sith Lord
2Darth Vader10000.00Boss
3darth VAder„Dark“15000.00Sith Lord

Step 2: Correct

The problem we are usually facing now is to decide which of the competing entries are correct.

We really need the help of people who know the business context here, who can tell whether Darth Vader earns 10 or 15 thousand credits and whether his role is Boss or Sith Lord.

Check out Example 3.1 to learn how to easily get this information into Excel and get the corrected results back into the database – no matter which database you use.

We will end up with a couple of updates and deletes that clean up our problematic data.

Step 3: Prevent

To actively prevent our database to contain competing entries again, we can use a unique constraint or unique index. Both will throw an error when anyone tries to add a row with a value for a column protected by a unique constraint that already exists.

There are again some nuances depending on the database you are using. In Oracle, we can add unique constraints only on pure columns. Unique indexes, however, can be added to columns in combination with functions like lower or trim (this is called a function-based index).

Since we want to protect the name column in a case-insensitive way, we choose the unique index with lower

create unique index deathstar_users_uq_name 
  on deathstar_users (lower(name));

or even with lower and trim

create unique index deathstar_users_uq_name 
  on deathstar_users(lower(trim(name)));

Now, we cannot add any combination of Darth Vader to the data table. All the following insert statements will fail:

insert into deathstar_users ( id, name ) 
  values (100, 'Darth Vader');
insert into deathstar_users ( id, name ) 
  values (101, 'darTh vader');
insert into deathstar_users ( id, name ) 
  values (102, 'Darth Vader   ');

You can find the whole example as SQL-script here.

Previous: Example 2: Automated tests of database functionality with pure PL/SQL

Next: Example 3.1: Getting information from database to Excel and back to the database