Cosmos DB joins are a no no a go go

Freeing myself from the shackles of standard T-SQL, I’ve been getting to know the Cosmos DB (Was Document DB) No-SQL platform as a service (PaaS) as part of a number of projects. Me being the data guy was asked to create some queries and just add any value I can to the data side of it. So I’ve been finding out the limits and the approach to doing things in to a bunch of stuff with the JSON files.

So for starters the SQL support on Cosmos DB doesn’t support all the cool things that SQL Server can do, partly since this is a service focused on a different approach. It has only been around for a few years, so the level of maturity isn’t quite there for some of the functions you would expect basic SQL to do. However you can create stored procedures and user defined functions that can fill some of that gap.

Read More »

Windows Phone is Dead

Well looks like the Windows Phone Platform is dead and gone, which will please a number of clients I know who this year just rolled it out to their users.

What next, since now there are only two real players in the phone space, Android and iOS? If Microsoft want to control both hardware and software, like Apple do and Google do with their Pixel phones, well they can still do that (sort of) with Android? Samsung do it with Android, they take the stock Android OS and add their TouchWiz layer to it. A Microsoft version of Android? Could be nice with the new UI design they are rolling out to Win10.

Update: They are doing that… great minds uh!

Power BI – License changes fallout

On the 3rd of May, Microsoft announced a few changes to the Power BI Service, and some new features, (see here). Since then we’ve fielded a large number of queries about it, and after talking to some of the Microsoft guys so have they. One mentioned that some of their customers had been quite angry.

Power BI Desktop –  No changes here, still free to use

Power BI Service Free – This has been the most annoying of changes for a lot of people, as you can no longer share dashboards. For any sharing you will need the Pro service. Before you could buy a small number of Pro licensees for your report developers, then have your report consumers use the free version. No longer, sorry, you have to have Pro. If you have Pro on the plus side at least your consumers will now be able to use items that use the Data Gateway, and Row Level Security, but for a number of small companies, the £72 per year, per user may be a bit to much.

Power BI Pro – No big changes here, a few tweaks, content packs will become Power BI Apps (Not to be confused with Power Apps) and Application work spaces. To do anything useful with Power BI you’ll need a Pro license.

Power BI Premium – The big change! From our own experience a blocker to large organisations was how to scale Power BI up and how to best use it. Another issue was the demand for some organisations to have Power BI on premise, this in a few ways address these. Rather it being based on licenses, the expected use is based on the metric of frequent and infrequent users, to ensure that they have a consistent level of service. The more users the more processing nodes/ram you’ll get. These nodes are also dedicated hardware, not the shared service that you would get with the Free and Pro service, eliminating the noisy neighbour, if one user of the service is giving a good hammering, you’ll not be affected. base prices are £3,100 per node. You’ll also get Power BI Report Server to deploy on premise, which would give you the ability to move workloads between the cloud and on premise parts. Maybe you have a busy report end month, move the relevant reports to the best suited service.

Power BI On Premise – This is now Power BI Report Server (PBRS), an extension to SSRS. Currently in SSRS 2016 Power BI on premise is in technical preview (TP), the full release will be coming in SQL Server 2017 due around the end Q2 early Q3 2017. The TP is limited to SSAS data sources for now, others will be added over time. One of the interesting features is that it will be updated at a quicker rate that say SSRS, even though it is a superset of SSRS, it is not that dependant on SSRS, so updates to PBRS will not affect your SSRS installation. Over the years some SSRS updates have changed the Report Server databases that are used in the background, but it looks like (for now) that it will not impact that.

You will either have to have Power BI Premium to run it, or SQL Server, Enterprise Edition, licensed on a per core basis, with Software Assurance. If you do deploy it via the SA method, you will still need a Power BI Pro licence to deploy reports. Consuming the deployed reports will not require a Power BI Pro licence

Power BI Embedded – This is also changing significantly, the Power BI Embedded API is being brought into the normal Power BI API, and will stop in June 2018. You’ll also need Power BI Premium to be able to use it, there is a lower pricing tier for just the new Embedded, around £2,500 per month, but of a number of customer who have been using the 3p per session method to create a reporting portal and applications, this will be coming to an end.

So to recap, to do anything you’ll need in some form of a Power BI Pro licence. Depending on the number of users, standard of service and budget, you’ll need Pro or Premium. However it is still priced competitively compared to Qlikview and Tableau, based on a per user. I think some organisations will be disappointed with the cost of hosting On Premise reporting and the current limitations, but hopefully the service will get better quickly as long as there is a demand.

One thing that will hit some people hard is the move of Power BI Embedded to Premium, number of projects we’ve done have used the service to create a application or portal that uses the Embedded API. Now you have to have Premium. What gets me is that MS have been showcasing Embedded on the Power BI Blog and other places, only to change it with no commercial alternative. £3,100 a month is a bit too much for those users. MS have had feed back from ISV’s and Partners and have said that they are looking into a more reasonable pricing tier, hopefully they can get that out soon as it requires a choice to be made if people need to rewrite their application to a different technology. 

My Old PC’s

The brief history of my computing


I upgraded my father in laws old PC with my desktop PC, as I’m not using it anymore, so I got in return, my old old PC back. I took it apart and I could see the motherboard and did recognise it, and the heat sink, but could not for the life of me recall the chip. One quick heat sink removal later it was an AMD Athlon 2800 chips, 2.8ghz clock speed, and 1 core. Oh yeah, that old beast, that was about 2002 or so. I got me thinking, of the others in my computing history. So I started listing what I had.
Commodore VIC-20 – games and some early animation moving letters about
Amiga A1200, with a 60 meg hard drive – couldn’t believe I would use 60 meg. Used a very early animation program on it, Imagine 3D v2, that was free on the cover of a magazine. Loved playing around with Deluxe Paint IV
Advent (maybe) 486DX2-66, with about 8meg of ram. My first PC running Windows 3.11, later updated to Windows 95. This was my first, and last off the shelf desktop PC, every other desktop after that was custom build, as I wanted better control of the hardware.
Pentium 2 equivalent – Novatech PC – might be with an ATI All-in-Wonder Pro card, that captured video.
P100 equivalent Toshiba Laptop –  A heavy thing, used it for university work
Some custom build based around a AMD chip, maybe a K2 or 3, with a serious graphics card (At the time) Geforce 2 GTX, started doing a bit of 3D animation work using at first a cracked copy of Lightwave 3D v5.6, then an actual version, when I purchased v6.5 as some features didn’t work correctly. Did some flyers and logo’s. I also created my favourite project at university on it, Terraforming Mars. As part of one of the ecosystems module I created a series of images that showed the progress of the forestation of Mars. Some good science went into it, got a A15 for it, would have got an A16 grade, but I had a few typo’s in it.
AMD Athlon 2800, 1Gb of ram, 60 Gb hard drive, with my first DVD drive! First 64bit chip.
HP Laptop – DV2000 Intel Core Duo maybe?? 13inch –  Piano black case looked smart, XP Media edition, it came with a little remote control that you could slide in and out of the side
AMD K7 3 core PC 8Gb of ram, a few 320, and an 640Gb hard drive, later updated to a 120Gb SSD
HP Laptop DV6000 15 inch, can’t recall much else
i5 Intel 4 cores, 16Gb ram, 240Gb SSD, 1Tb drive, 640Gb drive, 1Gb AMD Graphics card
MSI Steel Series – i7 laptop, 4 cores- 17inch, 16Gb ram, 120Gb SSD, 1TB drive, Nvidia 1Gb graphics
Updated 2020 – 

Macbook Pro – 16 inch i9, 16Gb Ram 1TB SSD. Very nice, have been loving the Mac since I started using a Mac Mini at work to do somethings. MacOS have really come on since I last used one about 2005?? It is a slightly better user experience, but there is not that much in it with Win 10 these days.

So that’s a total of 12 computers but I also have to account for an IPad 2, Motorola Zoom, Samsung A6 tablet, 6 Raspberry Pi’s, and a Play Station 3, 8 inch Fire Tablet and 7 inch Lenovo tablet. 

So I’ve gone from a VIC-20 3k to an i7 16Gb monster over the last 33 years or so. One other thing, USB sticks are the new Bic pen tops and chuck keys for drills, as I seem to keep losing them!
As for OS’s, I’ve been through Windows 3.11, 95, 98, 2000, XP Pro, XP Media Centre, Vista, 7, 8 10. Also Linux Debian, Ubuntu and Mint.


Confession time

During my first true IT role, rather than just being a guy good at IT in the office, I was tasked with the backup tape duties of the AS400 while the regular guy was off on holiday.
I was shown how to load the AS400 backup tapes into the tape hopper,  and take the latest back up to be removed and taken to an offsite location, which was the security hut about 50 meters away. It looked simple enough. So on the Monday I took out the Sunday backup, and loaded it with a fresh set of seven tapes for the rest of the week. Nice, easy, job done.
However, I got in on Tuesday morning for the early support shift, and as soon as the clock had ticked over to 7:30am the helpdesk started getting a load of calls about the main ERP system being down. Soon about 350 people could not do any work. A quick call to the one of the AS400 guys sorted out the issue. It seems that I had loaded one of the tapes wrong and it had jammed. The backup process had failed, and the ERP system did not start up if the backup failed. The AS400 guy started the ERP system back up, people got back on with work. I checked the tapes; they looked OK, next day the same thing happened, Tape Jam the sequel, I had put it in wrong again. The AS400 person finally updated the start-up routines, so if the backup failed, the ERP would start. Thankfully, they never put the backup issue with me loading the tapes, and as the system had gone live within the last few weeks as part of a company takeover, they put it down to teething issues.

I see dead people, in my database

Millions, millions of dead people are voting! said Trump. Well for starters, they aren’t voting, despite the claims, however they will be in a database.

As a consultant I see data sources that contain all sorts of stuff, and they have one thing in common. They are full of dirty data. In fact when i’m talking to a customer, you have to tell them that it is a common problem, a lot of clients think that the issue is unique to them, it’s not, don’t worry, we know how to handle it. I normally say tell them this:

‘The only time you’ll see a clean database, is when it is empty and no users are entering data into it’

Think about it, there a most likely millions of dead customers in the Amazon customer database, or in any user registering database. Facebook will be a massive virtual cemetery of people over the coming years. Having dirty data in your database can be a problem. During my time consulting for a bank for the Payment Protection Insurance (PPI) Claims and building their management information, we came across the issue of people being contacted about making claim as they had PPI in the past, however they had died, which can (and was) upsetting their remaining relatives.

Every year I get a Electoral Register Form to confirm who are the registered voters living at the address. The rough data latency (the time of updates) could be a year, or even more depending when I move address. Dirty data isn’t the issue, it is normally your process in updating data that is.

The trouble with twins

Sadly there hasn’t been much love shown to Data Quality Services (DQS) in the last few releases of SQL Server, and I don’t think there will be in the coming SQL Server vNext release.

DQS is slow, it’s cumbersome, it’s interface is terrible, it’s does not scale well and takes a number of headache inducing workarounds to get it to a level of decent performance.

However it works, and can work well when it does. However as with all things data when it interacts with people it hurts. Some background to the project, DQS is matching people from separate data sources, each without a common identifier or reference across the systems. So we running some matching to on the data. We can add synonyms, so names like David, can be referred to as Dave, or John, Jon and Jonathan, so in the event of one system using one name it will match with the data from the other. 
So in match we can get a level of confidence, expressed in a percentage, in fact the process uses not just the name but a bunch of other factors, date of birth, address and so on. But one thing it breaks down on is a certain type of twins.
What sort? twins with identical, and nearly identical names. 
Identically named twins, surely no one does that? Sorry yes they do. I have encountered and confirmed that there (At least in one area 4 sets) twins named the same, without a middle name to differentiate them.
As for nearly identical names, again yes, for example Jane and Jade, only one letter different. So when matching them it gets a high level of success, in the high 90’s. What to do? The confidence level can’t be changed as it will ignore or create people when it should or shouldn’t.
Ahhhh people!

I want to dislike something!

Update: Since this post was created back in 2014, Facebook have added some new ‘Like’ options

Businesses need data. Data becomes information, and from that decisions are made. Depending on the data that you have will you make a good or bad decision?
In November 2012 a particular brand got over 61 million ‘Likes’, but as high as that figure is, what information and data can we get from it, and can we make an informed decision from it?
Social media platforms can be an effective way of getting feedback, marketing and advertising, but like most of those tools it is only as effective as the way it is used, the assumptions that you make and the questions that you ask. We also have to remember that it is only in the last 6 years that Twitter was founded and Facebook opened up to everyone and the now ubiquitous ‘Like’ button has only been around since April 2010.
The effect of social media on marketing and user engagement has profoundly altered the landscape between the corporation/company and the consumer. Feedback can be virtually instantaneous and hard for the company to manage. This year the Olympic coverage by NBC in America was advertised as ‘good as being there’ but it decried by those watching it, with time delays, little live coverage and pre-recorded segments. The Twitter hash tag #nbcfail went viral, used not just to refer to the Olympics, but other programs on its network and its brand was tarnished. Also in 2012 the short film ‘Kony 2012’ also went viral but in a more positive way and raised the awareness of the child soldiers in a militia lead by Joseph Kony in pats of Uganda, Congo and Sudan.
There have been many high profile incidences of this sort of positive and negative viral feedback, made possible by the ease of use of social networks ability to communicate among its users, but it is also the ease of use that is its fundamental weakness.
As simple test also indicates something strangely amazing …that actual engagement depends on the question, and looks for simple answers to simple questions. In a completely un-scientific test one of our employees posted the following two status updates within seconds of each other on their Facebook profile:

1: What shall we do about the gap between the rich and poor in the world today???
2: Someone just brought in Crispy Cream donuts into the office…Get in!!!!

Which got the first reply? Answer: 2 (with in 5 minutes of posting)
Which got the most likes? Answer: 2
Which got the most comments? Answer: 2

But to be fair, is this a judgement of the subject matter or just their friends?
Using the ‘Like’ button as an example, it is far too easy for the user to click it, and not a good measure of user engagement. Charities in particular have noticed this, it can get a large number of ‘Likes’ for a campaign, but when averaging out the donations to ‘Like’ ratio it can be as low as 10p per like. Data from the ‘Just Giving’ web site, has found that over a 17 day period they found that about 6% of visitors that came to the site from Facebook ‘Like’ links actually donated. It indicates that it is easy to push the button like, but not to engage. Using the ‘Kony 2012’ example, how many people today can say what happened after they clicked ‘Like’
But back to the original statistic at the start… in November 2012 a particular brand got over 61 million ‘Likes’.
What information can that tell us? In this case nothing of real value. You may have figured out the 61 million ‘Likes’ is referring to the 2012 United States Presidential election. It is only when you consider that Barack Obama got 61,170,405 votes (50.5% share) to Mitt Romney’s 58,163,977 (48% share) can you actually make an informed opinion. In this case the population of America is very much polarised politically and no matter who won would, roughly half the people would not like the winner. In effect 58,163,977 people pushed the ‘Dislike’ button for Obama.
It gets even harder when you look at the effectiveness of the marketing campaign. The campaign to re-elect Obama, cost about $930,000,000, or about $15 per vote. Or was it? We have to remember that the most important people were the undecided voters. Those that would have voted Democrat or Republican anyway do not count in the calculation as their votes are given, and no positive or negative marketing would have made an impact on them. In America for the 2012 election about 23% of registered voters classed themselves as Swing/Floating voters. With 39% committed Democrats and 37% committed Republicans as mentioned before who will vote for their party regardless. So to get from 39% of voters to 50.5% actually cost the Obama campaign just over $300 per vote. A back of the envelope calculation of the combined Democratic and Republican campaigns results in a total spend of $639 for each swing voter. These figures are a very rough approximation of the data, but it indicates that in the whole population of voters, only 23% can be classed as actively engaged.
So we can draw a rough idea of the interactions of the user:

1. Expect a low engagement rate
2. Only expect simple answers to simple questions
3. Target your market selectively

Current customer and market research by Adobe has highlighted an interesting feature request. Users want a ‘Dislike’ button. Why? Maybe because the user wants a choice, they want to be able to say ‘No’ this doesn’t appeal to me, maybe just having a ‘Like’ button is a bit corporate sycophantic, to many ‘Yes’ men driving the functions of the business to disaster. How many people would have pushed the ‘Like’ button if Captain Smith of the Titanic had posted ‘Just told the engine room full steam ahead….icebergs be damned!’
So we can add the following to the list

4. Users would like the choice to ‘Like’ things or not

Currently companies such as Starbucks & Amazon are having a bit of an image issue with the British public due to their low payment of corporation tax. What would be interesting to see is a comparison of the ‘Likes’ over this period. Have they gone down? Have they stayed the same? What would also be an interesting thought experiment is to see what would have happened if there was a ‘Dislike’ button. Would a user be more likely to press this option, rather than not respond? If we are measuring results and success accurately then companies need the fullest information that we can get, ‘Dislike’ may help provide that.