CTE to concatenate column values in multiple rows into a single string

UPDATE:  While using PIVOT and CTEs to concatenate strings is cool, I’ve found a better way using FOR XML… see http://marcjellinek.wordpress.com/2011/08/20/concatenating-text-using-for-xml/

ORIGINAL ARTICLE:

Using SQL Server, one thing I’ve always done is used PIVOT to concatenate strings.  The one thing about that solution is you have to impose a maximum number of rows that will be concatenated.

For example, take the following:

DECLARE @list TABLE (id int, item varchar(255))
INSERT INTO @list (id, item)
VALUES (1, 'One'), (2, 'Two'), (3, 'Three'), (4, 'Four'), (5, 'Five')

SELECT
     ISNULL([1], '') + ',' +
     ISNULL([2], '') + ',' +
     ISNULL([3], '') + ',' +
     ISNULL([4], '') + ',' +  
     ISNULL([5], '') as result
FROM
     @list
          PIVOT
               (
                    MIN(item)
                    FOR id IN ([1], [2], [3], [4], [5])
               ) pvt

This returns:

 One,Two,Three,Four,Five

Now, this solution takes care of a table with less than 5 rows (that’s what the ISNULL([x], ”) is all about), but what about a table having more than 5 rows?

You can do some dynamic SQL data-fu, wrap it in an EXEC or sp_ExecuteSQL statement.  But I’m lazy.  And I hate debugging dynamic SQL.

Common Table Expressions can be used.  Just view this as an exercise in recursion.  For those of you who like to read ahead, here’s the answer:

DECLARE @list TABLE (id int, item varchar(255))
INSERT INTO @list (id, item)
VALUES (1, 'One'), (2, 'Two'), (3, 'Three'), (4, 'Four'), (5, 'Five')
;WITH
CTE_ListPrep (item_id, item) AS
(
     SELECT
          ROW_NUMBER() OVER (ORDER BY id),
          l.item
     FROM
          @list l
),
CTE_TheList (item_id, thelist) AS
(
     SELECT
          MAX(item_id),
          CAST('' as varchar(max))
     FROM
          CTE_ListPrep
     UNION ALL
     SELECT
          tl.item_id - 1,
          l.item + ',' + thelist
     FROM
          CTE_TheList tl INNER JOIN CTE_ListPrep l ON tl.item_id = l.item_id
),
CTE_TheList_Final (thelist) AS
(
     SELECT
          LEFT(thelist, LEN(thelist) - 1)
     FROM
          CTE_TheList
     WHERE
          item_id = 0
)
SELECT thelist FROM CTE_TheList_Final

This returns:

 One,Two,Three,Four,Five

So how does this work?

The list is prepared in CTE_ListPrep.  This should return the order in which the items are to be displayed ([item_id]) an the item to be displayed ([item])

The heavy lifting is done in CTE_TheList.

A CTE can be used recursively.  There are two sections to a CTE, the part above UNION ALL (the root or anchor member) and the part below UNION ALL (the recursive member or part of the query that recurses through the result sets).

So what is our root and what does it do?

;WITH CTE_TheList (item_id, thelist)
SELECT
     MAX(item_id), 
     CAST('' as varchar(max))
FROM
     CTE_ListPrep

This basically says the first [item_id] value that we’ll be dealing with is going to be MAX([item_id]). The first value for [thelist] will be a zero-length string of type varchar(max).  I know, you are thinking “Thank you Captain Obvious, of the Starship Duh”.

But this is important.  This is our starting point.  We’re starting with a zero-length string.  The magic is in the recursive member of the CTE:

     SELECT
          tl.item_id - 1,
          l.item + ',' + thelist
     FROM
          CTE_TheList tl INNER JOIN CTE_ListPrep l ON tl.item_id = l.item_id

The first line of the SELECT clause (tl.item_id – 1) is a little confusing. It says that the next row to go through the recursive execution of the query will be one less than the current ID. It’s a setup for the next iteration.

The second line does what you’d think it does.  It looks up the value of the item from CTE_ListPrep based on the [item_id] value and concatenates it to the list.  Remember the zero-length string from the anchor.  A value plus zero-length string equals the value.  A value plus NULL equals NULL.  That’s why I used a zero-length string.

The recursion continues until the recursive member doesn’t return any rows.  The inner join will find that there is no row in CTE_ListPrep with an [item_id] = 0 and quit.

So if you look at the output of CTE_TheList, you’ll see this:

item_id     thelist
5          
4           Five,
3           Four,Five,
2           Three,Four,Five,
1           Two,Three,Four,Five,
0           One,Two,Three,Four,Five,

Now, you can see, only the row with [item_id] = 0 is useful (with an extra comma at the end).  Return only the row with [item_id] = 0 and trim off the last comma and you are done.  This is the purpose of CTE_TheList_Final

LIMITATIONS:

  • The supplied list must have a contiguous set of [item_id] values.  If one is missing, the inner join doesn’t find a match, no rows are returned from the recursive member and the recursion stops.  This is why I used ROW_NUMBER().  It will start at one and present a contiguous list of [item_id] values.
  • You control the sorting of the list in CTE_ListPrep.  Set the sort order in the ORDER BY clause of the ROW_NUMBER() function.  If you wanted it sorted alphabetically, just change the ORDER BY to be l.item instead of l.id.
  • Recursion using CTEs is limited to 100.  If you are going to have more than 100 rows, you’ll need to use the MAXRECURSION query hint.  MAXRECURSION has a maximum value of 32767.  You can also set it to 0, no limit is applied.  I haven’t tested this, but I suspect it will bomb out at the 32767 limit.
  • Recursion will cause spooling to tempdb.  If you look at the query plan for the CTE, you’ll see Table Spools and Index Spools.  This means a temporary table and temporary index are being created.
  • The delimiter is hard-coded into the recursive member of the CTE.  That’s where you set it.
  • varchar(max) is limited to 2^31 – 1 bytes.  It’s a big limit.  But it’s there.
  • If you have to use UNICODE strings, change varchar(max) to nvarchar(max).  nvarchar(max) is also limited to 2^31 – 1 bytes of data.

I’ve seen other explanations on the web, but frankly, I thought they weren’t all that clear.  So using those as a starting point, I re-wrote this as an example.

About these ads

, , , ,

  1. #1 by JaimePR on March 28, 2012 - 12:38 pm

    awesome example!
    I think this is the best way to concatenate multiple rows in a single string.
    Thanks for sharing this

    • #2 by Marc Jellinek on March 28, 2012 - 3:27 pm

      Actually, I think a custom aggregation function, written using SQLCLR and the .NET language of your choice would be better. Certainly easier to implement across multiple uses. But this will do in a pinch (or where you aren’t allowed to use SQLCLR.

  2. #3 by Loren M Lowe on April 11, 2013 - 9:01 am

    Nice article. The explanation was clear and the example worked beautifully. It helped me greatly!!
    Thanks for sharing!

  1. Concatenating Text using FOR XML « rnd(stuff)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: