And then, there was Day 11. We are supposed to look for a stabilized layout of seat occupation when all of the people follow certain rules, depending on how many seats around them are already occupied.
This is Conway’s Game of Life with a little addition: Tiles where no seat exists.
Since I wanted to try out the Oracle MODEL clause for a while, I thought it couldn’t hurt to do this challenge with the special toolkit it provides. And it even worked pretty well, using the small input of the example.
Another benefit was that I could use Kim Berg Hansens awesome book “Practical Oracle SQL” which covers exactly that Game of Life example with the MODEL clause.
In hope to improve the performance overhead, I started with normalizing the data to get a vertical set of row_id, col_id data:
create table aoc_day11_normalized as
with
base_data as (
select
rownum row_id,
column_value line
from table (
aoc_file_loader.file_as_stringlist(
aoc_file_loader.local_url('day_11_input.txt')
)
)
),
cols as (
select
level col_id
from dual
connect by level <= (select length(line) from base_data where row_id = 1)
),
column_data as (
select
base_data.row_id,
cols.col_id,
substr(line, cols.col_id, 1) seat
from base_data
cross join cols
)
select * from column_data;
With that, I tried the approach with MODEL, first even without caring for change detection (we want to stop iterating once no changes occur):
with
column_data as (
select
row_id,
col_id,
seat
from aoc_day11_normalized
),
modeled_data as (
select
*
from column_data
model
dimension by (
0 generation,
row_id,
col_id
)
measures (
seat,
0 as sum_occupied,
0 as nb_occupied
)
ignore nav
rules upsert all iterate(1) (
sum_occupied[iteration_number, any, any] =
sum(
case seat
when 'L' then 0
when '#' then 1
else null
end
)[
generation = iteration_number,
row_id between cv()-1 and cv()+1,
col_id between cv()-1 and cv()+1
],
nb_occupied[iteration_number, any, any] =
sum_occupied[iteration_number, cv(), cv()]
- case seat[iteration_number, cv(), cv()] when '#' then 1 else 0 end
,
seat[iteration_number+1, any, any] =
case
when seat[iteration_number, cv(), cv()] = '.' then
'.'
when seat[iteration_number, cv(), cv()] = 'L'
and nb_occupied[iteration_number, cv(), cv()] = 0 then
'#'
when seat[iteration_number, cv(), cv()] = '#'
and nb_occupied[iteration_number, cv(), cv()] >= 4 then
'L'
else seat[iteration_number, cv(), cv()]
end
)
),
display as (
select
generation,
row_id,
listagg(seat) within group(order by col_id) seats,
sum(case seat when '#' then 1 else 0 end) occupied_seats
from modeled_data
group by generation, rollup(row_id)
)
select
generation, row_id, col_id, seat
from modeled_data
That worked pretty well with the tiny input-data of the example – but one iteration alone on my real input (92×92 cells) took half a minute – and it got worse the more iterations I added.
After putting some serious but fruitless effort into that solution, I did what a knowledge worker does if they hit a problem they cannot solve: I asked for help and contacted Kim as an expert who even wrote about that MODEL clause.
As he is an incredibly nice guy, he answered immediately and even came up with a blog post, pointing out that aggregate functions seem to significantly hurt the performance.
So my final solution to Part 1 runs now in about 2 minutes, needing 96 iterations:
with
column_data as (
select
row_id,
col_id,
seat,
case seat
when '#' then 1
else 0
end occupied
from aoc_day11_normalized
--where row_id <= 10
),
modeled_data as (
select
*
from column_data
model
dimension by (
0 generation,
row_id,
col_id
)
measures (
seat,
occupied,
0 as sum_occupied,
0 as nb_occupied,
0 as changes,
1 as sum_changes
)
ignore nav
rules upsert all iterate(200) until (iteration_number > 1 and sum_changes[iteration_number, 1, 1] <= 0) (
occupied[iteration_number, any, any] =
case seat[iteration_number, cv(), cv()]
when '#' then 1
else 0
end ,
nb_occupied[iteration_number, any, any] =
occupied[iteration_number, cv()-1, cv()-1]
+ occupied[iteration_number, cv()-1, cv()]
+ occupied[iteration_number, cv()-1, cv()+1]
+ occupied[iteration_number, cv(), cv()-1]
+ occupied[iteration_number, cv(), cv()+1]
+ occupied[iteration_number, cv()+1, cv()-1]
+ occupied[iteration_number, cv()+1, cv()]
+ occupied[iteration_number, cv()+1, cv()+1]
,
seat[iteration_number+1, any, any] =
case
when seat[iteration_number, cv(), cv()] = '.' then
'.'
when seat[iteration_number, cv(), cv()] = 'L'
and nb_occupied[iteration_number, cv(), cv()] = 0 then
'#'
when seat[iteration_number, cv(), cv()] = '#'
and nb_occupied[iteration_number, cv(), cv()] >= 4 then
'L'
else seat[iteration_number, cv(), cv()]
end,
changes[iteration_number+1, any, any] =
case when seat[iteration_number, cv(), cv()] != seat[iteration_number+1, cv(), cv()] then 1
else 0 end,
sum_changes[iteration_number+1,1,1] = sum(changes)[iteration_number+1, any, any]
)
),
display as (
select
generation,
row_id,
sum(changes) changes,
sum(case seat when '#' then 1 else 0 end) occupied_seats
from modeled_data
group by generation, rollup(row_id)
)
select
*
from display
where row_id is null
order by generation desc
fetch first row only;
Let’s be honest – 2 minutes are still not great. But it’s bearable. And it’s completely done in SQL.
For part 2, however, I decided to use PL/SQL. Algorithms like these is where SQL hits its limit – it’s just not the right tool to solve it, given that a solution in a procedural language is not that difficult.
First, I re-coded part 1, which was straight forward. To hold the data I used a varray(100) of varrays(100) of integer
, basically a 2-dimensional array.
The main logic is in the functions iterate_cell_part1
and especially is_occupied
:
function is_occupied( y integer, x integer, i_array in t_rows ) return integer as
begin
if y <= 0 or y > i_array.count or x <= 0 or x > i_array(y).count then
return 0;
end if;
return case i_array(y)(x) when '#' then 1 else 0 end;
end;
function iterate_cell_part1( y integer, x integer, i_array in t_rows ) return char as
l_neighbours integer;
begin
if i_array(y)(x) = '.' then
return '.';
end if;
l_neighbours :=
is_occupied(y-1, x-1, i_array)
+ is_occupied(y-1, x , i_array)
+ is_occupied(y-1, x+1, i_array)
+ is_occupied(y , x-1, i_array)
+ is_occupied(y , x+1, i_array)
+ is_occupied(y+1, x-1, i_array)
+ is_occupied(y+1, x , i_array)
+ is_occupied(y+1, x+1, i_array);
if i_array(y)(x) = 'L' and l_neighbours = 0 then
return '#';
elsif i_array(y)(x) = '#' and l_neighbours >= 4 then
return 'L';
else
return i_array(y)(x);
end if;
end;
To solve part 2, all I had to do was to change is_occupied to sees_occupied – the rest is pretty much the same:
function sees_occupied( y integer, x integer, ystep integer, xstep integer, i_array in t_rows )
return integer as
l_nextY integer := y + ystep;
l_nextX integer := x + xstep;
begin
if l_nextY <= 0 or l_nextY > i_array.count or l_nextX <= 0 or l_nextX > i_array(y).count then
return 0;
end if;
return case i_array(l_nextY)(l_nextX)
when '.' then sees_occupied(l_nextY, l_nextX, ystep, xstep, i_array)
when '#' then 1
else 0 end;
end;
The nice thing: I got my result in 7 seconds for part 1 and 13 seconds for part 2.
You can see the whole solution my github-repository.
0 Comments