Delimited Strings, CTE’s and Recursion



I was looking through a client’s SSIS package and noticed that they had a C# script task that pulled out data from a column with a delimited string using the good old semi-colon ‘;’. After looking at it for a bit, I thought it was a bit over engineered and didn’t really need a C# script to do it. So I’ve came up with a way of doing it in T-SQL, using a Common Table Expression (CTE) and an interesting feature of CTE’s, that they can self-reference themselves….but how. Let’s create the table and insert some data.

-- Create a table to hold the data
CREATE TABLE #SourceData
( SomeColumnOne INT
, SomeColumnTwo INT
, StringData VARCHAR(MAX)
)
GO

-- Insert some data to use
INSERT #SourceData
VALUES
(1, 2, '100;101;102')
, (3, 4, '200;201;202;203')
, (5, 6, '300;301;302;303;304')
GO

-- Check everything is ok
SELECT SomeColumnOne
, SomeColumnTwo
, StringData
FROM
#SourceData
SomeColumnOne SomeColumnTwo StringData
1 2 100;101;102
3 4 200;201;202;203
5 6 300;301;302;303;304

It may look like a little bit of data, but it was roughly the same data size coming from the customer’s source data, hence why I did think it was a bit of overkill in the first place. Here’s the code that I used:


;WITH
CTE_Source (SomeColumnOne, SomeColumnTwo, StringExtract, StringData) AS
( SELECT SomeColumnOne
, SomeColumnTwo
, LEFT(StringData, CHARINDEX(';', StringData + ';') -1)
, STUFF(StringData, 1, CHARINDEX(';', StringData + ';'), '') FROM
#SourceData

UNION ALL
-- This table references the cte, while in the cte!
SELECT SomeColumnOne
, SomeColumnTwo
, LEFT(StringData, CHARINDEX(';' , StringData + ';') -1)
, STUFF(StringData, 1, CHARINDEX(';', StringData + ';'), '')
FROM
CTE_Source
WHERE
StringData > ''
)
SELECT
SomeColumnOne, SomeColumnTwo, String from CTE_Source
But what does it do?
Well the first SELECT statement just returns the first occurrence of the delimitated string:
SELECT SomeColumnOne
, SomeColumnTwo
, LEFT(StringData, CHARINDEX(';', StringData + ';') -1) AS StringExtract
, STUFF(StringData, 1, CHARINDEX(';', StringData + ';'), '') AS StringData
FROM
#SourceData
Which gives us
SomeColumnOne SomeColumnTwo StringExtract StringData
1 2 100 101;102
3 4 200 201;202;203
5 6 300 301;302;303;304
as we are performing a union on the CTE itself with the second SELECT statement, it is iterating thought the string, the CHARINDEX & STUFF statements reduces down the string with each pass, until it returns ”.
So the second final result is
SomeColumnOne SomeColumnTwo String StringData
1 2 100 101;102
1 2 101 102
1 2 102  
3 4 201 202;203
3 4 202 203
3 4 203  
3 4 200 201;202;203
5 6 300 301;302;303;304
5 6 301 302;303;304
5 6 302 303;304
5 6 303 304
5 6 304  
You can get through a far bit of data this way, however you can run into the default recursion limit of 100, but there is a way to override this by using setting the max recursion option at the end of the CTE, which is OPTION(MAXRECURSION 0). Careful though, this setting is normally used stop a CTE from causing an infinite loop. In your face C#, T-SQL still rules!
Further reading:
CTE’s
CHARINDEXSTUFF

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s