I ran into a requirement today where I had to combine a number of strings (varchars) into a delimited list. Some strings would be NULL and should be ignored.

I played around and found a bunch of different approaches which I’d like to share.

Let’s assume we have a package info_util with the function person which will combine a number of information about a person to a delimited list (to simplify the example all information is passed by parameter):

dbms_output.put_line(info_util.person(
  i_name => 'Luke Skywalker',
  i_alignment => 'light',
  i_comment => 'Most powerful jedi of all times'));
-- Output: Luke Skywalker, light, Most powerful jedi of all times

dbms_output.put_line(info_util.person(
  i_name => 'Vader',
  i_title => 'Darth',
  i_alignment => 'dark',
  i_comment => 'Pretty evil'));
-- Output: Vader, Darth, dark, Pretty evil

Basic String concatenation

The simplest solution is just to go over everything with some if-logic and concatenating the strings

function person_concatenate(
  i_name varchar2,
  i_title varchar2,
  i_alignment varchar2,
  i_comment varchar2
) return varchar2 deterministic
as
  l_result varchar2(32000);
  begin
    if i_name is not null then
      l_result := l_result || i_name;
    end if;

    if i_title is not null then
      if l_result is not null then
        l_result := l_result || ', ';
      end if;
      l_result := l_result || i_title;
    end if;

    if i_alignment is not null then
      if l_result is not null then
        l_result := l_result || ', ';
      end if;
      l_result := l_result || i_alignment;
    end if;

    if i_comment is not null then
      if l_result is not null then
        l_result := l_result || ', ';
      end if;
      l_result := l_result || i_comment;
    end if;

    return substr(l_result, 1, 4000);
  end;

But this is very hard to read, understand and maintain. If you don’t believe me, how long do you need to make sure there’s no difference between handling of i_alignment and i_comment?
Its also tedious and boring.

Using LISTAGG

The LISTAGG functionality does more or less exactly what we want here, so why don’t we use it?

function person_listagg(
  i_name varchar2,
  i_title varchar2,
  i_alignment varchar2,
  i_comment varchar2
) return varchar2 deterministic
as
  /* We have to collect all parts
     in a nested table collection first
   */
  l_parts sys.odcivarchar2list := new sys.odcivarchar2list();
  l_result varchar2(4000);

  /* This function makes our code more
     readable: just extend the collection and add an item
   */
  procedure add_part( i_string varchar2 )
  as
    begin
      l_parts.extend;
      l_parts(l_parts.last) := i_string;
    end;
  begin
    /* Now we can simple add the parts */
    add_part(i_name);
    add_part(i_title);
    add_part(i_alignment);
    add_part(i_comment);

    /* And can use LISTAGG to aggregate */
    select listagg(column_value, ', ') within group (order by rownum)
      into l_result
      from table(l_parts);

    return l_result;
  end;

Just remember: whenever you are switching between SQL and PL/SQL, you will need to factor in the cost of a context switch.
However, unless the function is called a lot or you require really fast response times, it may not be a real issue.

Concatenate in a PL/SQL Loop

So maybe we can handle the concatenation in a PL/SQL Loop instead of using LISTAGG.

function person_plsql_comb(
  i_name varchar2,
  i_title varchar2,
  i_alignment varchar2,
  i_comment varchar2
) return varchar2 deterministic
as
  /* So lets take the thing that improves readability */
  l_parts sys.odcivarchar2list := new sys.odcivarchar2list();
  l_result varchar2(4000);
  procedure add_part( i_string varchar2 )
  as
    begin
      l_parts.extend;
      l_parts(l_parts.last) := i_string;
    end;
  begin
    add_part(i_name);
    add_part(i_title);
    add_part(i_alignment);
    add_part(i_comment);

    /* But handle the concatenation in PL/SQL */
    for i in l_parts.first..l_parts.last loop
      if l_parts(i) is not null then
        if l_result is not null then
          l_result := l_result || ', ';
        end if;
        l_result := l_result || l_parts(i);
      end if;
    end loop;

    return l_result;
  end;

That looks kinda nice to me. But what about the performance?

The following code runs the function under test 20’000 times (10’000 for each use-case). To eliminate any advantage of the DETERMINISTIC keyword, I added the run to each name, so no caching involved.

declare
  l_ts timestamp := current_timestamp;
  l_info varchar2(4000);
begin
	for i in 1..10000 loop
		l_info := info_util.person_concatenate('Luke Skywalker'||to_char(i), null, 'light', 'Most powerful jedi of all times');
		l_info := info_util.person_concatenate('Vader'||to_char(i), 'Darth', 'dark', 'Pretty evil');
	end loop;
	dbms_output.put_line('Basic concatenation: ' || to_char(current_timestamp-l_ts));
end;
/
Basic concatenation0.02 seconds
LISTAGG0.8 seconds
PL/SQL Loop0.1 seconds

So we see that the context-switches of LISTAGG indeed slow down our functionality. The looping and collection initialization stuff also comes at a performance cost, but it’s not that bad.

What if we use the DETERMINISTIC advantage and basically just call 2 different statements over and over again?

Basic concatenation0.0008 seconds
LISTAGG0.0009 seconds
PL/SQL Loop0.0008 seconds

There is hardly any difference, which means that if your data is homogenous, the additional performance cost of the Collection-with-Loop approach won’t be noticable (Performance always depends on your data).

It also shows that in many situations, concerns about performance-loss due to context-switches between SQL and PL/SQL won’t have the huge negative impact we might expect.

PL/SQL Object

What we could do now is to extract the whole aggregation part into a PL/SQL object (and yes, of course we can improve the readability and still not use objects, but they are very handy, especially in these cases):

create or replace type t_string_aggregator force is object
(
  /* The nested table is now part of our object */
  c_parts sys.odcivarchar2list,

  constructor function t_string_aggregator return self as result,
  member procedure add_string( i_string varchar2 ),
  member function get_aggregate( i_delimiter varchar2 default ', ' )
    return varchar2
);
/

create or replace type body t_string_aggregator as
  constructor function t_string_aggregator return self as result
  as
    begin
      /* Lets not forget to initialize the collection */
      c_parts := new sys.odcivarchar2list();
      return;
    end;

  /* This is basically the same as the internal procedure
     we used in the other approaches
   */
  member procedure add_string( i_string varchar2 )
  as
    begin
      c_parts.extend;
      c_parts(c_parts.last) := i_string;
    end;

  /* We can even make the delimiter dynamic */
  member function get_aggregate( i_delimiter varchar2 )
    return varchar2
  as
    l_result varchar2(4000);
    begin
      /* Little tweak if we dont have any items */
      if c_parts.count < 0 then
        return null;
      end if;

      for i in c_parts.first..c_parts.last loop
	      if c_parts(i) is not null then
	        if l_result is not null then
			      l_result := l_result || i_delimiter;
		      end if;
	        l_result := l_result || c_parts(i);
	      end if;
      end loop;

      return l_result;
    end;
end;
/

The final function would then look like this:

function person_plsql_obj(
  i_name varchar2,
  i_title varchar2,
  i_alignment varchar2,
  i_comment varchar2
) return varchar2 deterministic
as
  l_aggregator t_string_aggregator := new t_string_aggregator();
  begin
    l_aggregator.add_string(i_name);
    l_aggregator.add_string(i_title);
    l_aggregator.add_string(i_alignment);
    l_aggregator.add_string(i_comment);

    return l_aggregator.get_aggregate();
  end;

Pretty small and readable once we know what t_string_aggregator is good for, right?

Performance-wise, it is about 10% slower than the PL/SQL Loop approach – which again is not noticable when using the DETERMINISTIC advantage – depending on your data.

Warning: This solution has severe problems once the sum of the string-parts exceed 4000 chars. But that’s another topic and another example.

You can run the full example on LiveSQL and get the sourcecode from my GitHub repository.

Thoughts

So, why do I like this last approach via PL/SQL Objects so much?

From my point of view, it modularizes and encapsulates my functionality pretty good. This leads to the following advantages:

  • it’s easier to understand when I use it in other places
  • it has a clearly defined scope
  • it is easier to expand (e.g. overflow-mechanics)
  • it enables DRY (Don’t repeat yourself)
  • it is very easy to test.

Looking forward to hear what you folks think about it.


0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.