What is Big Data?

The party wasn’t a very good one, 75% of them where teachers for some reason, one person who I sort of knew from other gatherings worked for the HMRC and I was trying to avoid his gaze, not for any financial wrong doings I may add, just knowing governments need cash, and the next thing you know that they have made a mistake in my great grandfather’s PAYE stuff, and as governments really need the money and rather go after international tech companies, I’d get it in the wallet for the compound interest of a shillings mistake 120 years ago. The other person I was avoiding was my lovely wife who was giving me that look which was one of (or combination) of a few options.
1) So I guess I’m driving
2) I can’t believe you said that
3) Don’t mention that bit of gossip about you know what to you know who
4) You mentioned that bit of gossip about you know what to you know who
Somehow I had managed to be talking to the only other people at the party who also worked in IT, so it was a bit of casual shop talk for a bit, and the subject of ‘Big Data’ came up, and strangely all three of us had three different definitions of it. I tend to think of big data in terms of volume, tables with billions of rows or objects that need some sort of business intelligence around it.
The other guy was working for a marketing company and was seeing Big Data in terms of social media, and aggregations from a wide variety of websites and un-structured data.
The third was a business analyst and was talking to about Big Data in term of analytics on the volume or types of data.
Weirdly looking at Big Data definitions we were all correct, but me being me I like to think I was more correct than the others. One of the issues that the marketing guy had was processing the data and the strain on development and hardware resources in processing their customer data to find new insights. It’s not a new problem, when starting a project that would end up driving their industry changing store card, Tesco found that their hardware systems where struggling with processing the large amount of customer data. The data volumes where to just too big, but they hired some data experts who told them they could throw away 95% of the data and just use the remaining 5% to found out the shopping habits of the rest of the population. That has not changed in many business areas, but with the drive of targeted promotions and deeper analysis, there is also a need to process all the data and excuse the metaphor, squeeze as much of the juice as possible out of the customer orange.
But the other issue of Big Data is the mix of the types of data, you can have structured and unstructured data in the mix. Most business have structured data that tells them that they sold a product to this customer at this point in time and shipped it at this date. It’s the unstructured data that is the issue for a number of businesses. What is unstructured data? Well it is quite a mix of types, photos, social media posts, documents and other random data that is not normally time and space specific.
Typically mixing those two types of data has always been tricky, structured data is well known as SQL based technology, relational tables for your related data. Unstructured data has been called ‘No-SQL’ getting rid of the SQL concept to some degree, and using more complex languages such as java to query data sources. With the normal marketing hype, ‘No-SQL’ was the SQL killer, all databases would soon be No-SQL but this hasn’t quite happened, as each type of database is being used for the right job, however some companies have found that after implementing a ‘No-SQL’ solution, they returned back to a regular SQL database model. Another thing that has happened with a number of tools sets such as Microsoft SQL Server and Teradata, is that unstructured data has now been brought into the SQL tool sets, so you can query unstructured data just like structured data. (Technically Unstructured data is a bit of a myth, it all has structure, more in a later post)
Strangely companies using the right tool for the right job is not always done. I was once in a meeting were the Chief Information Officer had been talking to a social media expert and was looking to get this brand new social media platform implemented, which had enabled companies like HP, BT, and other organisations reduce their support costs, by offering a social platform were the users could post their issues and get help from the online community. Thus leveraging the fact that 5% of the users on the social media platforms are pathological helpful. However the question put to him stopped the vision from becoming a reality, as it questioned the use of this technology in this company’s customer engagement and market sector. That question was ‘Erm…. Exactly how do you support a sandwich?’
Anyway back to the party, after some lamentations by the other two I saw the answer to their problems in three words Parallel Data Warehouse (PDW). Oh and the Cloud, that’s four words. Normally when something is chewing some data on a computer its one computer querying the database hosted on it. With PDW the data is split among lots of computers (nodes) all doing the bit of data that it holds, then after a bit of aggregation returns the dataset. Or to put it other way it’s like the difference of if it takes one man to build a wall in 3 hours, how long will it take 12 men?
PDW has been called Microsoft’s best kept secret in terms of analytics, and is hosted in the cloud, which has access to loads of nodes on the server farms around the word. PDW can take tables of billions and billions of rows of data and return it a lot quicker than normal SQL Server, it also deals with unstructured data the same way, delivering insight from all types of data. That just one of the benefits, it also has the added benefit of the usual Microsoft Clouds options, of only paying for what you need when you use it.
I told the other two about it, once again proving that the MD should approve my request to change my job title from ‘Senior Consultant’ to ‘Data visionary and information guru who pushes back the boundaries of ignorance’, the glow of being awesome stopped only when I got in the car with my wife on the drive back home!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s