Sadly there hasn’t been much love shown to Data Quality Services (DQS) in the last few releases of SQL Server, and I don’t think there will be in the coming SQL Server vNext release.
Month: January 2017
Quote of the week
Sometimes developing a solution from the clients database(s) is like putting together a jigsaw, which is picture side down, while in a pitch black room, wearing a blindfold and boxing gloves.
I want to dislike something!
Social media platforms can be an effective way of getting feedback, marketing and advertising, but like most of those tools it is only as effective as the way it is used, the assumptions that you make and the questions that you ask. We also have to remember that it is only in the last 6 years that Twitter was founded and Facebook opened up to everyone and the now ubiquitous ‘Like’ button has only been around since April 2010.
As simple test also indicates something strangely amazing …that actual engagement depends on the question, and looks for simple answers to simple questions. In a completely un-scientific test one of our employees posted the following two status updates within seconds of each other on their Facebook profile:
1: What shall we do about the gap between the rich and poor in the world today???
2: Someone just brought in Crispy Cream donuts into the office…Get in!!!!
Which got the first reply? Answer: 2 (with in 5 minutes of posting)
Which got the most likes? Answer: 2
Which got the most comments? Answer: 2
But to be fair, is this a judgement of the subject matter or just their friends?
Using the ‘Like’ button as an example, it is far too easy for the user to click it, and not a good measure of user engagement. Charities in particular have noticed this, it can get a large number of ‘Likes’ for a campaign, but when averaging out the donations to ‘Like’ ratio it can be as low as 10p per like. Data from the ‘Just Giving’ web site, has found that over a 17 day period they found that about 6% of visitors that came to the site from Facebook ‘Like’ links actually donated. It indicates that it is easy to push the button like, but not to engage. Using the ‘Kony 2012’ example, how many people today can say what happened after they clicked ‘Like’
But back to the original statistic at the start… in November 2012 a particular brand got over 61 million ‘Likes’.
What information can that tell us? In this case nothing of real value. You may have figured out the 61 million ‘Likes’ is referring to the 2012 United States Presidential election. It is only when you consider that Barack Obama got 61,170,405 votes (50.5% share) to Mitt Romney’s 58,163,977 (48% share) can you actually make an informed opinion. In this case the population of America is very much polarised politically and no matter who won would, roughly half the people would not like the winner. In effect 58,163,977 people pushed the ‘Dislike’ button for Obama.
It gets even harder when you look at the effectiveness of the marketing campaign. The campaign to re-elect Obama, cost about $930,000,000, or about $15 per vote. Or was it? We have to remember that the most important people were the undecided voters. Those that would have voted Democrat or Republican anyway do not count in the calculation as their votes are given, and no positive or negative marketing would have made an impact on them. In America for the 2012 election about 23% of registered voters classed themselves as Swing/Floating voters. With 39% committed Democrats and 37% committed Republicans as mentioned before who will vote for their party regardless. So to get from 39% of voters to 50.5% actually cost the Obama campaign just over $300 per vote. A back of the envelope calculation of the combined Democratic and Republican campaigns results in a total spend of $639 for each swing voter. These figures are a very rough approximation of the data, but it indicates that in the whole population of voters, only 23% can be classed as actively engaged.
So we can draw a rough idea of the interactions of the user:
1. Expect a low engagement rate
2. Only expect simple answers to simple questions
3. Target your market selectively
Current customer and market research by Adobe has highlighted an interesting feature request. Users want a ‘Dislike’ button. Why? Maybe because the user wants a choice, they want to be able to say ‘No’ this doesn’t appeal to me, maybe just having a ‘Like’ button is a bit corporate sycophantic, to many ‘Yes’ men driving the functions of the business to disaster. How many people would have pushed the ‘Like’ button if Captain Smith of the Titanic had posted ‘Just told the engine room full steam ahead….icebergs be damned!’
So we can add the following to the list
4. Users would like the choice to ‘Like’ things or not
Currently companies such as Starbucks & Amazon are having a bit of an image issue with the British public due to their low payment of corporation tax. What would be interesting to see is a comparison of the ‘Likes’ over this period. Have they gone down? Have they stayed the same? What would also be an interesting thought experiment is to see what would have happened if there was a ‘Dislike’ button. Would a user be more likely to press this option, rather than not respond? If we are measuring results and success accurately then companies need the fullest information that we can get, ‘Dislike’ may help provide that.
MDX and Sum columns
Current Program
|
Next Program
|
No of People
|
Sum of People
|
Percentage To Next Stage
|
Program 1
|
Program 2
|
152
|
310
|
49.0%
|
Program 1
|
Program 3
|
68
|
310
|
21.9%
|
Program 1
|
Program 4
|
47
|
310
|
15.2%
|
Program 1
|
Program 5
|
33
|
310
|
10.6%
|
Program 1
|
Program 6
|
10
|
310
|
3.2%
|
Total
|
310
|
N/A
|
100.0%
|
SSIS – XML, Foreach Loop and sub folders
Network
12345678.csv
Space
12345678.csv
Sessions
12345678.csv
No, sadly the ‘Traverse sub folder doesn’t work like that. Nuts, so after a quick search I found this link at Joost van Rossum’s blog that uses a bit of C# to get the folder list and generate an XML schema with the folder list in it. You can then use the ‘Foreach NodeList Enumerator’ in the Foreach Loop to get the files.
Well sort of, the code on that website only gets the folder structure, not the full path of the file. It was however a good starting point, and looked like it could be adapted to get the full file list, that could be passed on to the data flow logic in the Foreach Container. Now I’m TSQL through and through, my C# is poor, but slowly getting better, however I did mange to hack the code so it got the file list. So here it is, please use, improve and share.
#region Namespaces
using System;
using System.Data;
using System.IO;
using System.Xml;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Collections.Generic;
#endregion
namespace ST_b87894259c434eeca3da339009a06fdf
{
///
/// ScriptMain is the entry point class of the script. Do not change the name, attributes,
/// or parent of this class.
///
// Use this for SQL Server 2012 and above
[Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute]
// Use the below for SQL Server 2008, comment out the above
// [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
#region VSTA generated code
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
// Variables for the xml string
private XmlDocument xmldoc;
private XmlElement xmlRootElem;
public void Main()
{
// Inialize XMLdoc
xmldoc = new XmlDocument();
// Add the root element:
xmlRootElem = xmldoc.CreateElement("", "ROOT", "");
// Add Subfolders as Child elements to the root element
GetSubFolders(Dts.Variables["User::FeedsFilePath"].Value.ToString());
// Add root element to XMLdoc
xmldoc.AppendChild(xmlRootElem);
// Fill SSIS variable with XMLdoc
Dts.Variables["xmldoc"].Value = xmldoc.InnerXml.ToString();
Dts.TaskResult = (int)ScriptResults.Success;
}
// Recursive method that loops through subfolders
private void GetSubFolders(String parentFolder)
{
// Get subfolders of the parent folder
string[] subFolders = Directory.GetDirectories(parentFolder);
var allfiles = DirSearch(parentFolder);
foreach (var filePath in allfiles)
{
XmlElement xmlChildElem;
XmlText xmltext;
// var directoryInfo = new DirectoryInfo(Path.GetDirectoryName(filePath));
// Create child element "Folder":
// d:\foreachfoldertest\subfolder1\
xmlChildElem = xmldoc.CreateElement("", "File", "");
xmltext = xmldoc.CreateTextNode(filePath);
xmlChildElem.AppendChild(xmltext);
// Add child element to root element
xmlRootElem.AppendChild(xmlChildElem);
}
}
// This bit gets the file list and adds it to the subfolders
private List DirSearch(string sDir)
{
List files = new List();
foreach (string f in Directory.GetFiles(sDir))
{
files.Add(f);
}
foreach (string d in Directory.GetDirectories(sDir))
{
files.AddRange(DirSearch(d));
}
return files;
}
}
}
E:\SomeFolder\Space\12345678.csv
D:\SomeFolder\Sessions\12345678.csv
(1) – The C# script is run to feed the Foreach container
Deploying SSIS Packages SQL Server 2008
for %I in (*.dtsx) do
Long live the PC
which has led to PC’s with TV tuners and TV’s with internet access. A couple of things stopped ‘Convergence’, lack of connection standards and agreement between companies. Content was restricted as media creators and distributors refused to release items in different formats to keep costs down, also wanting to supply their content through their own portals. This is changing now with Hulu, Netflix and iTunes providing easy accessible portals across a wide range of devices.
Crime and weather a curious insight
I had a few days between projects, so spent the time getting a bit more familiar with Power Query. I have used crime data to animate a Thermal Map in Power Maps before Here. But I was playing around with Azure Data Market to access the freely available data sets, one of which was the UK Met office weather data to look at animating weather using Power Map. Then i had a bit of a brain wave, I had two data sets, that matched time and space and had a look to see if the weather affected crime. The above graph indicates that the number of Anti Social Behaviour incidents follows the temperature. This isn’t a deep statistical analysis, just a quick eyeball of the flow of the graphs but it is suggestive that something is going on for 2012 at least.
What is Big Data?
2) I can’t believe you said that
3) Don’t mention that bit of gossip about you know what to you know who
4) You mentioned that bit of gossip about you know what to you know who
The other guy was working for a marketing company and was seeing Big Data in terms of social media, and aggregations from a wide variety of websites and un-structured data.
The third was a business analyst and was talking to about Big Data in term of analytics on the volume or types of data.
But the other issue of Big Data is the mix of the types of data, you can have structured and unstructured data in the mix. Most business have structured data that tells them that they sold a product to this customer at this point in time and shipped it at this date. It’s the unstructured data that is the issue for a number of businesses. What is unstructured data? Well it is quite a mix of types, photos, social media posts, documents and other random data that is not normally time and space specific.
I told the other two about it, once again proving that the MD should approve my request to change my job title from ‘Senior Consultant’ to ‘Data visionary and information guru who pushes back the boundaries of ignorance’, the glow of being awesome stopped only when I got in the car with my wife on the drive back home!
Warning Business Intelligence
Ten years have now gone by, I can tell the story of how my IT Skills may have made people lose their jobs.
It’s about 2006/07 and I was working as a Customer Service Team Engineering Gatekeeper at what was then Finning Materials Handling. Basically it was a fancy job title for controlling the field engineer’s assigned forklifts and other equipment. The details got updated, I ran reports and generally tried to chat up temps, and anything with a XX chromosome really. Despite being in such a target rich environment I got a batting average of zero.
Not one.
Nothing.
Nada.
I also sat next to the most excellent Steve Rowlands, who together we had chats and great bant’s.
Anyway, I should stick to the point, as we had acquired another forklift company called Lex Harvey the previous year, we were running two IT systems. One for the Caterpillar equipment (Finning stuff) and one for the Lex Harvey kit (All sorts of tat). What the issue was is that they were separate systems, no communication between the two so it required a bit of a brain melting issue to log even a breakdown for a customer.
Reporting from it was a right pain as it normally took about a day and a half to produce stuff, however I had a few tricks up my sleeve, mainly to automate the process with use of a VBA type script that could record me doing the reports once. Then all I had to do was a search and replace on the dates every week to run the reports. The process then took me about 2 mins, I just left the script to run on both systems, and boom, leaving me more time to check out the temps, ‘Hi how you doing, would you like a coffee? No it’s no hassle, I like my women how I like my coffee… thrown over me’
So one of the days I had nothing much to do, and realised that I could work out the total equipment numbers of all the teams in the UK, and do some analytics around it. Total numbers, ratio of engineers to forklifts stuff like that.
So I exported all the data into Excel 2003 (Feels like old school stuff) and quickly built up some numbers, and sent it on to the Service Area Managers. It went down a treat. The two from the north, whose names I forget but one of the guys looked like the ‘Cigarette smoking man’ out of the X-Files, went nuts over it. The North West one had done something similar, the North East guy (Mulder knows too much) had also done something similar, but one for the Finning side and one for the Lex side. So kudos all round for Mr Lunn for displaying initiative, technical skill and being awesome. I’d been promoted 3 times in 3 years, maybe this would be the path to the next one. We all got together and sorted out a few issues with the numbers, there were a few items no on the systems which one of our customer owned but we sorted out the servicing.
They then presented it to some other people, then tried to take the credit. It didn’t work…. Hahahaha fuckers!
Any hoo… the report got kicked up to higher management, then even higher management, then to the Chiefs, COO, CIO, CFO, and the big man, the CEO.
They loved it, they went nuts for it, I had some senior people come up to me, and say stuff like ‘Great piece of work Jon’, pats on the back and handshakes all round.
It then got serious.
Really fucking serious.
They started asking questions about the fleet numbers, detailed reports on it, and how I came to those numbers. Sure OK, here’s how I did it, showed them my working out, more comments like ‘It’s a great driver for the business’ and ‘It’s really important that the number reflect the reality on the ground’. So I showed them everything. They all nodded and said stuff like ‘Great, we’re very happy with the numbers now’.
Behind the scenes some meetings started going on, serious meetings, with serious questions, about serious decisions.
They dropped the bombshell, they were cutting back the number of engineers in the field and making redundancies.
Shit.
Fuck.
Shit fuck.
I felt like a right massive C word. I’d worked running engineers in the south, the midlands, but mostly the north east. They liked me, I was chatty, funny and I used to drive a forklift so knew what drivers did to piss them off when it comes to repairing them. So when they all come down for the meetings and HR stuff, they came and saw me, handed me their paperwork, I shook their hands said the usual platitudes.
Felt like an even more super massive C.
Anyway, it turned out I wasn’t the trigger for it, little did I know is that Finning were in talks with another company called Briggs, which would then purchase the forklift division, and I understand is that Finning wanted to make the company a bit leaner and improve the books. I, however, to some degree had helped.
So the company got took over, I got promoted to IT and Technical Analyst – Business Intelligence. However slightly more careful about what reports I ran.