The puzzle of Day 6 requests us to group answers to a customs declaration form and analyse them.
Challenge one is the format we need to normalize into something we can work with in SQL. Each answer is a line, but several answers are grouped together and the groups are separated by an empty line.
So as a first step, we introduce a new column that indicates whether the row is a break or not and then use window functions to summarize all breaks from beginning to the current position, which will be our group-ID
base_data as (
select
rownum answer_id,
column_value answers,
case when column_value is null then 1 else 0 end is_break
from table(
aoc_file_loader.file_as_stringlist(
aoc_file_loader.local_url('day_6_input.txt')
)
)
),
grouped_data as (
select
answer_id,
group_id,
answers
from (
select
answer_id,
answers,
is_break,
sum(is_break) over (
order by answer_id
range between unbounded preceding
and current row)
group_id
from base_data
)
where is_break = 0
)
select * from grouped_data
It’s important that we filter away the is_break
after we applied the window function!
The next step took me a while – my first attempt was to split the answer-rows with connect by level
according to the length of the string, but this solution was terribly slow and took several minutes for only ~20 lines of input-data.
So my second approach was to provide a baseline of all characters a-z as result set we could cross-join against – and this worked pretty well:
with
answer_pos as (
select column_value c
from table(sys.odcivarchar2list(
'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'
)
)
),
base_data as (
select
rownum answer_id,
column_value answers,
case when column_value is null then 1 else 0 end is_break
from table(
aoc_file_loader.file_as_stringlist(
aoc_file_loader.local_url('day_6_input.txt')
)
)
),
grouped_data as (
select
answer_id,
group_id,
answers
from (
select
answer_id,
answers,
is_break,
sum(is_break) over (
order by answer_id
range between unbounded preceding
and current row)
group_id
from base_data
)
where is_break = 0
),
distinct_one_answer_per_row as (
select distinct
group_id,
answer_pos.c answer
from grouped_data
cross join answer_pos
where instr(answers, answer_pos.c) > 0
),
distinct_answers as (
select
group_id,
listagg(answer, '') within group (order by answer) distinct_answers
from distinct_one_answer_per_row
group by group_id
),
answer_length as (
select
group_id,
length(distinct_answers) answer_length
from distinct_answers
)
select
sum(answer_length)
from answer_length
Part two changes the rules a bit in a way that we should only count the answers that appear on all answered forms per group.
To get to this point we need to compare the number of total answers per group (no matter the answer) and the number of answers per group to a specific answer-position (character). If both match, this specific answer-position appears in all the answers per group.
with
answer_pos as (
select column_value c
from table(sys.odcivarchar2list(
'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'
)
)
),
base_data as (
select
rownum answer_id,
column_value answers,
case when column_value is null then 1 else 0 end is_break
from table(
aoc_file_loader.file_as_stringlist(
aoc_file_loader.local_url('day_6_input.txt')
)
)
),
grouped_data as (
select
answer_id,
group_id,
answers
from (
select
answer_id,
answers,
is_break,
sum(is_break) over (
order by answer_id
range between unbounded preceding
and current row)
group_id
from base_data
)
where is_break = 0
),
distinct_one_answer_per_row as (
select
group_id,
answer_pos.c answer,
count(distinct answer_id)
over (partition by group_id) group_answers,
count(*)
over (partition by group_id, answer_pos.c) this_pos_answered
from grouped_data
cross join answer_pos
where instr(answers, answer_pos.c) > 0
),
all_agreed_upon_answers as (
select distinct
group_id,
answer
from distinct_one_answer_per_row
where group_answers = this_pos_answered
),
distinct_answers as (
select
group_id,
listagg(answer, '') within group (order by answer) distinct_answers
from all_agreed_upon_answers
group by group_id
),
answer_length as (
select
group_id,
length(distinct_answers) answer_length
from distinct_answers
)
select
sum(answer_length)
from answer_length
A bit tricky, but fun and rewarding.
You can find the complete code in the github-repository.
0 Comments