We are now facing a pretty common situation for data that isn’t normalized:

We have very different values for ALIGNMENT and can even see a typo in only one of the role-entries (“Pirsoners” instead of “Prisoners”)

The normal forms, which are at the heart of the relational model, were created to prevent exactly that kind of problem, and we’ll see in this example how to use them.

Step 1: Identify

What we want to identify is what different entries for Alignments and Roles we actually have in the database:

select distinct alignment from deathstar_users;
ALIGNMENT
„dark“
„light“
„bright“
„dark „
select distinct role from deathstar_users;
ROLE
Sith
Sith Lord
Prisoner
Pirsoner

In the case of alignments, we can even add some SQL magic, removing a lot of noise. Read the following from the most inner bracket, first:

select distinct
  '„' || -- 4th we add the quotation marks back again
  lower( -- 3rd we make everything lowercase
    trim( -- 2nd we trim to remove all unnecessary whitespaces
      -- 1st we remove those strange quotation marks
      regexp_replace(alignment, '[„“]', '') 
    )
  )
  || '“' -- 4th adding the closing quotation mark
    as alignment
from deathstar_users;
ALIGNMENT
„dark“
„light“
„bright“

Step 2 for Alignments: Correct

First, we need to define a new table and insert all the valid entries. In many cases, inserting of the valid data is best done by hand. In cases where we have huge amounts of (lookup-)data, we might want to specify very clear rules and do it via insert into ... select – but be careful with this approach.

create table alignments (
  id integer generated always as identity primary key,
  name varchar2(256) not null unique 
     -- Let's also make sure alignment-name is unique
);

insert into alignments ( name ) values ('„dark“');
insert into alignments ( name ) values ('„light“');
insert into alignments ( name ) values ('„neutral“');
commit;

Now we need a new column in the deathstar_users table

alter table deathstar_users
  add alignment_id integer;

If the existing values are so over the place as with our example, we need to do some manual work again. We can, however, use SQL and correct values by keywords:

update deathstar_users set
  alignment_id = (select id from alignments where name = '„dark“')
where lower(alignment) like '%dark%';

update deathstar_users set
  alignment_id = (select id from alignments where name = '„light“')
where lower(alignment) like '%light%'
  or  lower(alignment) like '%bright%';

commit;

Let’s check what we have now:

select duser.name     as username
      ,alignment      as original_alignment
      ,align.name     as new_alignment
from deathstar_users duser
  left outer join alignments align
    on duser.alignment_id = align.id
;
USERNAMEORIGINAL_ALIGNMENTNEW_ALIGNMENT
Darth Vader„dark“„dark“
Mace Windu„light“„light“
Luke Skywalker„bright“„light“
Darth Maul„dark „„dark“
Obi-Wan Kenobi„light“„light“

That looks pretty good, so we can remove the old column.

alter table deathstar_users
  drop column alignment;

Bonus Step: Not breaking the application

In order to not break any existing applications (we changed the column name), we can use a view to mimic the old behaviour and rename the underlying table:

alter table deathstar_users rename to t_deathstar_users;

create or replace view deathstar_users as
  select duser.id
        ,duser.name
        ,align.name as alignment
        ,duser.salary
        ,duser.role
    from t_deathstar_users duser
      left outer join alignments align
        on duser.alignment_id = align.id
;

select * from deathstar_users;
1Darth Vader„dark“15‘000.00Sith Lord
4Mace Windu„light“Prisoner
5Luke Skywalker„light“-300.42Pirsoner
6Darth Maul„dark“5000Sith
7Obi-Wan Kenobi„light“0.00Prisoner

Now we have the exact same API as before – but underneath we made sure alignment is normalized into its own table.

If we want to also allow insert/update/delete into the view, we will have to add an instead-of trigger. That’s totally possible to do, but out of scope for this post. If you want to know more, just hit me up via e-Mail or Twitter.

Step 3 for Alignment: Prevent

We’ve already done most of the prevent-part by normalizing the alignment data into its own table, but we can even go further and make sure that only IDs that actually exist in ALIGNMENTS can be entered into T_DEATHSTAR_USERS.ALIGNMENT_ID:

alter table t_deathstar_users
  add foreign key ( alignment_id ) references alignments( id );

-- We cannot add an entry with a non-existing Alignment-ID anymore
insert into t_deathstar_users( id, name, alignment_id )
  values ( 300, 'Jar Jar Binks', 10);

Step 2 and 3 for Roles

Now that we know how to approach these kind of improvements, we can do the same that we did for Alignments to Roles now.

Caution: When changing things, it’s a good idea to work in steps as small as possible. Fight the urge to change and improve several things at once, you will probably end up being a lot slower and a lot more stressed!

create table roles (
  id integer generated always as identity primary key,
  name varchar2(256) not null unique
);

insert into roles ( name )
  select distinct role
  from deathstar_users
  where role != 'Pirsoner';

select * from roles;

alter table t_deathstar_users
  add role_id integer references roles( id ) 
  -- we can specify the foreign key right when adding the column
;

update t_deathstar_users set
  role_id = (select id from roles where name = 'Sith')
where role = 'Sith';
update t_deathstar_users set
  role_id = (select id from roles where name = 'Sith Lord')
where role = 'Sith Lord';
update t_deathstar_users set
  role_id = (select id from roles where name = 'Prisoner')
where role in ('Prisoner', 'Pirsoner');

select duser.name     as username
      ,role           as original_role
      ,t_role.name    as new_role
from t_deathstar_users duser
  left outer join roles t_role
    on duser.role_id = t_role.id;

create or replace view deathstar_users as
  select duser.id
        ,duser.name
        ,align.name as alignment
        ,duser.salary
        ,t_role.name as role
    from t_deathstar_users duser
      left outer join alignments align
        on duser.alignment_id = align.id
      left outer join roles t_role
        on duser.role_id = t_role.id
;

alter table t_deathstar_users
  drop column role;

select * from deathstar_users;
1Darth Vader„dark“15‘000.00Sith Lord
4Mace Windu„light“Prisoner
5Luke Skywalker„light“-300.42Prisoner
6Darth Maul„dark“5000Sith
7Obi-Wan Kenobi„light“0.00Prisoner

Now we have consistent data of Alignments and Roles and made sure it stays that way.

You can find the whole example as a single script here.

Previous: Example 3.1: Getting information from database to Excel and back to the database

Next: Example 5: Identify, correct, and prevent salary issues