We are now facing a pretty common situation for data that isn’t normalized:
We have very different values for ALIGNMENT and can even see a typo in only one of the role-entries (“Pirsoners” instead of “Prisoners”)
The normal forms, which are at the heart of the relational model, were created to prevent exactly that kind of problem, and we’ll see in this example how to use them.
Step 1: Identify
What we want to identify is what different entries for Alignments and Roles we actually have in the database:
select distinct alignment from deathstar_users;
ALIGNMENT |
---|
„dark“ |
„light“ |
„bright“ |
„dark „ |
select distinct role from deathstar_users;
ROLE |
---|
Sith |
Sith Lord |
Prisoner |
Pirsoner |
In the case of alignments, we can even add some SQL magic, removing a lot of noise. Read the following from the most inner bracket, first:
select distinct
'„' || -- 4th we add the quotation marks back again
lower( -- 3rd we make everything lowercase
trim( -- 2nd we trim to remove all unnecessary whitespaces
-- 1st we remove those strange quotation marks
regexp_replace(alignment, '[„“]', '')
)
)
|| '“' -- 4th adding the closing quotation mark
as alignment
from deathstar_users;
ALIGNMENT |
---|
„dark“ |
„light“ |
„bright“ |
Step 2 for Alignments: Correct
First, we need to define a new table and insert all the valid entries. In many cases, inserting of the valid data is best done by hand. In cases where we have huge amounts of (lookup-)data, we might want to specify very clear rules and do it via insert into ... select
– but be careful with this approach.
create table alignments (
id integer generated always as identity primary key,
name varchar2(256) not null unique
-- Let's also make sure alignment-name is unique
);
insert into alignments ( name ) values ('„dark“');
insert into alignments ( name ) values ('„light“');
insert into alignments ( name ) values ('„neutral“');
commit;
Now we need a new column in the deathstar_users table
alter table deathstar_users
add alignment_id integer;
If the existing values are so over the place as with our example, we need to do some manual work again. We can, however, use SQL and correct values by keywords:
update deathstar_users set
alignment_id = (select id from alignments where name = '„dark“')
where lower(alignment) like '%dark%';
update deathstar_users set
alignment_id = (select id from alignments where name = '„light“')
where lower(alignment) like '%light%'
or lower(alignment) like '%bright%';
commit;
Let’s check what we have now:
select duser.name as username
,alignment as original_alignment
,align.name as new_alignment
from deathstar_users duser
left outer join alignments align
on duser.alignment_id = align.id
;
USERNAME | ORIGINAL_ALIGNMENT | NEW_ALIGNMENT |
---|---|---|
Darth Vader | „dark“ | „dark“ |
Mace Windu | „light“ | „light“ |
Luke Skywalker | „bright“ | „light“ |
Darth Maul | „dark „ | „dark“ |
Obi-Wan Kenobi | „light“ | „light“ |
That looks pretty good, so we can remove the old column.
alter table deathstar_users
drop column alignment;
Bonus Step: Not breaking the application
In order to not break any existing applications (we changed the column name), we can use a view to mimic the old behaviour and rename the underlying table:
alter table deathstar_users rename to t_deathstar_users;
create or replace view deathstar_users as
select duser.id
,duser.name
,align.name as alignment
,duser.salary
,duser.role
from t_deathstar_users duser
left outer join alignments align
on duser.alignment_id = align.id
;
select * from deathstar_users;
1 | Darth Vader | „dark“ | 15‘000.00 | Sith Lord |
4 | Mace Windu | „light“ | – | Prisoner |
5 | Luke Skywalker | „light“ | -300.42 | Pirsoner |
6 | Darth Maul | „dark“ | 5000 | Sith |
7 | Obi-Wan Kenobi | „light“ | 0.00 | Prisoner |
Now we have the exact same API as before – but underneath we made sure alignment is normalized into its own table.
If we want to also allow insert/update/delete into the view, we will have to add an instead-of trigger. That’s totally possible to do, but out of scope for this post. If you want to know more, just hit me up via e-Mail or Twitter.
Step 3 for Alignment: Prevent
We’ve already done most of the prevent-part by normalizing the alignment data into its own table, but we can even go further and make sure that only IDs that actually exist in ALIGNMENTS
can be entered into T_DEATHSTAR_USERS.ALIGNMENT_ID
:
alter table t_deathstar_users
add foreign key ( alignment_id ) references alignments( id );
-- We cannot add an entry with a non-existing Alignment-ID anymore
insert into t_deathstar_users( id, name, alignment_id )
values ( 300, 'Jar Jar Binks', 10);
Step 2 and 3 for Roles
Now that we know how to approach these kind of improvements, we can do the same that we did for Alignments to Roles now.
Caution: When changing things, it’s a good idea to work in steps as small as possible. Fight the urge to change and improve several things at once, you will probably end up being a lot slower and a lot more stressed!
create table roles (
id integer generated always as identity primary key,
name varchar2(256) not null unique
);
insert into roles ( name )
select distinct role
from deathstar_users
where role != 'Pirsoner';
select * from roles;
alter table t_deathstar_users
add role_id integer references roles( id )
-- we can specify the foreign key right when adding the column
;
update t_deathstar_users set
role_id = (select id from roles where name = 'Sith')
where role = 'Sith';
update t_deathstar_users set
role_id = (select id from roles where name = 'Sith Lord')
where role = 'Sith Lord';
update t_deathstar_users set
role_id = (select id from roles where name = 'Prisoner')
where role in ('Prisoner', 'Pirsoner');
select duser.name as username
,role as original_role
,t_role.name as new_role
from t_deathstar_users duser
left outer join roles t_role
on duser.role_id = t_role.id;
create or replace view deathstar_users as
select duser.id
,duser.name
,align.name as alignment
,duser.salary
,t_role.name as role
from t_deathstar_users duser
left outer join alignments align
on duser.alignment_id = align.id
left outer join roles t_role
on duser.role_id = t_role.id
;
alter table t_deathstar_users
drop column role;
select * from deathstar_users;
1 | Darth Vader | „dark“ | 15‘000.00 | Sith Lord |
4 | Mace Windu | „light“ | – | Prisoner |
5 | Luke Skywalker | „light“ | -300.42 | Prisoner |
6 | Darth Maul | „dark“ | 5000 | Sith |
7 | Obi-Wan Kenobi | „light“ | 0.00 | Prisoner |
Now we have consistent data of Alignments and Roles and made sure it stays that way.
You can find the whole example as a single script here.
Previous: Example 3.1: Getting information from database to Excel and back to the database
Next: Example 5: Identify, correct, and prevent salary issues