I was looking through a client’s SSIS package and noticed that they had a C# script task that pulled out data from a column with a delimited string using the good old semi-colon ‘;’. After looking at it for a bit, I thought it was a bit over engineered and didn’t really need a C# script to do it. So I’ve came up with a way of doing it in T-SQL, using a Common Table Expression (CTE) and an interesting feature of CTE’s, that they can self-reference themselves….but how. Let’s create the table and insert some data.
-- Create a table to hold the data
CREATE TABLE #SourceData
( SomeColumnOne INT
, SomeColumnTwo INT
, StringData VARCHAR(MAX)
)
GO
-- Insert some data to use
INSERT #SourceData
VALUES
(1, 2, '100;101;102')
, (3, 4, '200;201;202;203')
, (5, 6, '300;301;302;303;304')
GO
-- Check everything is ok
SELECT SomeColumnOne
, SomeColumnTwo
, StringData
FROM
#SourceData
SomeColumnOne | SomeColumnTwo | StringData |
1 | 2 | 100;101;102 |
3 | 4 | 200;201;202;203 |
5 | 6 | 300;301;302;303;304 |
It may look like a little bit of data, but it was roughly the same data size coming from the customer’s source data, hence why I did think it was a bit of overkill in the first place. Here’s the code that I used:
;WITH
CTE_Source (SomeColumnOne, SomeColumnTwo, StringExtract, StringData) AS
( SELECT SomeColumnOne
, SomeColumnTwo
, LEFT(StringData, CHARINDEX(';', StringData + ';') -1)
, STUFF(StringData, 1, CHARINDEX(';', StringData + ';'), '') FROM
#SourceData
UNION ALL
-- This table references the cte, while in the cte!
SELECT SomeColumnOne
, SomeColumnTwo
, LEFT(StringData, CHARINDEX(';' , StringData + ';') -1)
, STUFF(StringData, 1, CHARINDEX(';', StringData + ';'), '')
FROM
CTE_Source
WHERE
StringData > ''
)
SELECT
SomeColumnOne, SomeColumnTwo, String from CTE_Source
But what does it do?
Well the first SELECT statement just returns the first occurrence of the delimitated string:
SELECT SomeColumnOne
, SomeColumnTwo
, LEFT(StringData, CHARINDEX(';', StringData + ';') -1) AS StringExtract
, STUFF(StringData, 1, CHARINDEX(';', StringData + ';'), '') AS StringData
FROM
#SourceData
Which gives us
SomeColumnOne | SomeColumnTwo | StringExtract | StringData |
1 | 2 | 100 | 101;102 |
3 | 4 | 200 | 201;202;203 |
5 | 6 | 300 | 301;302;303;304 |
as we are performing a union on the CTE itself with the second SELECT statement, it is iterating thought the string, the CHARINDEX & STUFF statements reduces down the string with each pass, until it returns ”.
So the second final result is
SomeColumnOne | SomeColumnTwo | String | StringData |
1 | 2 | 100 | 101;102 |
1 | 2 | 101 | 102 |
1 | 2 | 102 | |
3 | 4 | 201 | 202;203 |
3 | 4 | 202 | 203 |
3 | 4 | 203 | |
3 | 4 | 200 | 201;202;203 |
5 | 6 | 300 | 301;302;303;304 |
5 | 6 | 301 | 302;303;304 |
5 | 6 | 302 | 303;304 |
5 | 6 | 303 | 304 |
5 | 6 | 304 |
You can get through a far bit of data this way, however you can run into the default recursion limit of 100, but there is a way to override this by using setting the max recursion option at the end of the CTE, which is OPTION(MAXRECURSION 0). Careful though, this setting is normally used stop a CTE from causing an infinite loop. In your face C#, T-SQL still rules!