SSIS – XML, Foreach Loop and sub folders

SSIS normally has out of the box most things you need to get the job done, however sometime a client comes along with a data source that requires a bit of out of the box thinking. In this case a it was a load of ‘csv’ files in a folder, with over 30 sub-folders, each with a file name which turned out to be a time stamp, that contains a snapshot of their infrastructure, The requirements also needed for it to scale, so the client could add other folders with the same sub folder structure for other parts of their infrastructure. So you would have some thing like this:
ABCD
Network
12345678.csv
Space
12345678.csv
Sessions
12345678.csv
And so on for 32 other folders
Right not a problem, I just set up an SSIS For Each loop container click on ‘Traverse subfolders’ option and then set out the data connections.

No, sadly the ‘Traverse sub folder doesn’t work like that. Nuts, so after a quick search I found this link at Joost van Rossum’s blog that uses a bit of C# to get the folder list and generate an XML schema with the folder list in it. You can then use the ‘Foreach NodeList Enumerator’ in the Foreach Loop to get the files.

Well sort of, the code on that website only gets the folder structure, not the full path of the file. It was however a good starting point, and looked like it could be adapted to get the full file list, that could be passed on to the data flow logic in the Foreach Container. Now I’m TSQL through and through, my C# is poor, but slowly getting better, however I did mange to hack the code so it got the file list. So here it is, please use, improve and share.

#region Namespaces
using System;
using System.Data;
using System.IO;
using System.Xml;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Collections.Generic;
#endregion

namespace ST_b87894259c434eeca3da339009a06fdf
{
///
/// ScriptMain is the entry point class of the script. Do not change the name, attributes,
/// or parent of this class.
///


// Use this for SQL Server 2012 and above
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
// Use the below for SQL Server 2008, comment out the above
// [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{

#region VSTA generated code
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion

// Variables for the xml string
private XmlDocument xmldoc;
private XmlElement xmlRootElem;

public void Main()
{
// Inialize XMLdoc
xmldoc = new XmlDocument();

// Add the root element:
xmlRootElem = xmldoc.CreateElement("", "ROOT", "");

// Add Subfolders as Child elements to the root element
GetSubFolders(Dts.Variables["User::FeedsFilePath"].Value.ToString());

// Add root element to XMLdoc
xmldoc.AppendChild(xmlRootElem);

// Fill SSIS variable with XMLdoc
Dts.Variables["xmldoc"].Value = xmldoc.InnerXml.ToString();

Dts.TaskResult = (int)ScriptResults.Success;
}

// Recursive method that loops through subfolders
private void GetSubFolders(String parentFolder)
{
// Get subfolders of the parent folder
string[] subFolders = Directory.GetDirectories(parentFolder);

var allfiles = DirSearch(parentFolder);

foreach (var filePath in allfiles)
{
XmlElement xmlChildElem;
XmlText xmltext;

// var directoryInfo = new DirectoryInfo(Path.GetDirectoryName(filePath));

// Create child element "Folder":
// d:\foreachfoldertest\subfolder1\
xmlChildElem = xmldoc.CreateElement("", "File", "");
xmltext = xmldoc.CreateTextNode(filePath);
xmlChildElem.AppendChild(xmltext);

// Add child element to root element
xmlRootElem.AppendChild(xmlChildElem);
}

}
// This bit gets the file list and adds it to the subfolders
private List DirSearch(string sDir)
{
List files = new List();

foreach (string f in Directory.GetFiles(sDir))
{
files.Add(f);
}
foreach (string d in Directory.GetDirectories(sDir))
{
files.AddRange(DirSearch(d));
}
return files;
}
}
}
So that code now gets the full path for the file and adds it to the XML schema so you will end up with something like this:
E:\SomeFolder\Network\12345678.csv
E:\SomeFolder\Space\12345678.csv
D:\SomeFolder\Sessions\12345678.csv
That gets passed to the Foreach Loop. So now i have this list of files, how do I load them. As a default, each file has a data connection set up based on the example file the client submitted, once the columns and metadata was set up, the connection string could be moved to an expression based on the XML.
So the over all process looks like this, don’t worry it looks terrifying, but it is quite simple!

Here’s a bit of detail around the process
(1) – The C# script is run to feed the Foreach container
(2) – EXP – Get Feeds Filename. This takes the file path ‘E:\SomeFolder\Network\12345678.csv’ and removes the ‘E:\SomeFolder\’ just to return the ‘Network\12345678.csv’ part, as this is used in later steps to move the file to a completed or failed folder, both of which are set as project variables, so we just append the ‘Network\12345678.csv’ part with an expression to correctly move the file to the correct folder, for example ‘E:\Somefolder\Processed\Network\12345678.csv’.
(3) To correctly process the right file to the right data flow task an expression to get the right folder (FINDSTRING(@[User::SourceFolder] , “\\Network\\”, 1) > 0) and precedence constraint (Success) is used to trigger the correct data flow, and of course the data flow data source connection is controlled by the User::SourceFolder variable. The last two steps are as mentioned a move the completed csv file to a process or failed folder, using a Precedence Constraint set to a logical ‘Or’ so after each data flow it moves the file to the right folder. The process will loop and load until it reaches the last one in the XML and then stop. Its a lot more manageable than creating 35 individual packages to load the data and get around the ‘Traverse Subfolders’ issue. After testing It works very well, and the client can add folders with t he same folder structure for other parts of their infrastructure, and the process can accommodate them. The only pain was setting up all the precedance flows!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s