Data Culture – First steps in creating it in your organisation

These are the seventeen reasons why IT will not let me send my 250MB Excel file that we use to run the business via email

At Ricoh Digital Services (AKA the day job) we do a lot of Data & Analytics projects for a diverse range of customers. When the project is completed and handed over, we can let the customer run it, or we can support it. What we don’t get involved with is the changes that it can bring to an organisation. Sure, during the sales process, we talk about the single source of the truth, the outcomes, what the customer wants, but not how they use data internally. We assume that the Data Warehouse is going to be used, but what we don’t do is see, is if the organisation is using it effectively, we assume that it will be. However, what we see is a number of organisations reacting to information, rather that being lead by it. So here are some ideas I’ve gathered on turning that around, and moving from a reactive driven to a more proactive data driven and some of the benefits that it should bring.

The Problem

The hand loom circa 1850

The above image is a picture of a hand loom, weaving thread into cloth by hand, a laborious time intensive manual process, prone to mistakes. Replace that hand loom with Excel, and that is a chuck of your organisation today. That is the problem, there is normally a small cottage industry of Excel users creating reports, running teams, departments, entire companies, and COVID-19 track and trace databases using Excel. In the wrong, and even the right hands, Excel can be dangerous. Did you know that Excel played its part in a $6 Billion trading loss at JP Morgan as one of the copied and pasted quantitative models had some basic Excel issues, as noted in the internal investigation:

Read More »

PBIDS – Power BI Data Sources Files

The great thing about Power BI is the monthly updates and how much Microsoft are churning out features, updates, and all sorts of cool stuff. The worst thing about Power BI, is trying to keep up with all of it. In this post I’m going to look at Power BI Data Source files. PBIDS was announced back in October 2019, I must have been busy, but I didn’t pick up on it. But the other day I was using Azure Synapse, and its integration with Power BI on a project. It download a blank connection file, and noticed it had different icon and file extension. So have to figure out what is was.

Here is the announcement from Oct 2019, for those like me that missed it.

So what is a PBIDS file? Well it creates a basic file that holds a data source connection. At the moment you can only store one connection and not multiple connections. I looked at the posting and the documentation for it, but I didn’t quite get it at first, as I could not see how to define and save the file, just bits of JSON to set a connection. But that bit of JSON is all it is. It just holds the connection settings, so it is not a full fat PBIX Power BI file. It supports the following connection types:
Azure Analysis Service
SAP Hana
SharePoint List
SQL Server
Text file

Here is the basic JSON for a connection to a folder:

“version”: “0.1”,
“connections”: [
“details”: {
“protocol”: “folder”,
“address”: {
"path": "C:\\Users\\jon\\Documents\\Some Folder"

That is all there is to it. Create the JSON in notepad, save it as with the extension of ‘.pbids’ and there you have it. The Power BI Tips site has a connections builder that you can use that will generate the PBIDS file for you if you want.

If you load it, it fires up Power BI, and starts the ‘Get Data’ process for you.

Power BI PBIDS file opening

So you can save a set of predefined data connections and store them in the for users to quickly get access to data. It will not support storing usernames and passwords, as that would not be secure and just give access to anyone who has the file. I the moment it is at version 0.1, so there maybe some updates coming in the future, for example adding multiple connections. But it is a nice quick way to save and store connection templates.

Animate flight data in Power BI

“Have you seen the framework from Uber? It’s designed to show large scale big data visualization?” said one of the software engineering team. I had a look at it, and it does look amazing. The below image is from the framework, and shows car accidents across the UK, or at least in the screen grab, the south east and London.

Uber – Visual amazeballs

We went through some of the examples, and while looking at the flight data one, they mentioned that ‘You can’t do that in Power BI’.

Heathrow flights – example

I retorted, ‘You can do something a bit similar’ I then had to go prove I could do it. To be fair they had said that a few times while going through the examples, I had to pick one I knew I could do. There was no way you let the front end peeps lord it over us data geeks, what sort of a world would that be!

So first of all, I had to get some data. Thankfully the lovely people at Uber data visualisation, had provided the source of the data. One quick hyperlink later and I was downloading flight path data from the website The OpenSky Network. It’s a service that collects data from a network of stations that read the ADS-B and MODE-S signals transmitted from the aircraft for air traffic control.

It’s a big data set, it is split by day, and then by hour, and as it can be downloaded in CSV format, so I loaded the one file for the 07:00 to 07:59 data, that was about 200 mb.

Nice and easy so far, its a very easy formatted data set (check the readme.txt on the site), apart from the time columns. They are in the Unix format. For those not familiar with it, it is not a date time, it is the number of seconds that have passed since 01/01/1970. So 1574349827 is 11/21/2019 @ 3:23pm (UTC). That isn’t an issue with a bit of M we can convert it on load in Power BI to a nicely formatted date time format.

#datetime(1970, 1, 1, 0, 0, 0) + #duration(0, 0, 0, [time])

Once that was done, I mapped the points on the map visual which gave me the following output:

Aircraft Points

One problem, there was to many points, so I’ve filtered the data on the Latitude and Longitude columns just to show a range for the UK. Looks good, but we need to animate it. Does Power BI do animation? No… well not by default, but there is a custom visual called Play Axis that can enable you to do it.

Play Axis!

This allows you to add a column to it and play it in order, in this case my date time. In fact I didn’t need to convert the Unix time, as it would have played it back in numerical order! Once that was added to the report we have animated visual awesomeness!

It’s alive!

I’ve loaded the sample file on my GitHub here enjoy!

Power Query – Combining Columns in M

In my Power BI courses I always recommend some books and sites that will help them learn DAX or M. Most of the time i’m presenting to new users to Power BI, and having to get them to move from the Excel muscle memory that they have and also to show them that sometimes the code-less approach is not always the best way.

One example presented itself the other day. A user was combining some Excel sheets and wanted to create a rough dimension based on the values in one of the columns, then use it as a filter or axis. They hit the internet and came across a few post on how to do this. A few recommend this approach:

1 – Append the two tables together as new query
2 – Remove the columns you don’t need
3 – Remove duplicates

Well that is sort of fine, as you can’t specify a single column in the table append, and it can all be done in the interface. However is has a bit of overhead in the 1 and 2 stages, so can you do it so you only reference the relevant columns and remove some of the overhead. The answer is ‘Yes’, but you need to code it in M. Using the Blank Query function i created this M code:

   Source = Table.Combine(
                                Table.SelectColumns(#"table name 1",{"column name"})
                            ,   Table.SelectColumns(#"table name 2",{"column name"})
                            ,   Table.SelectColumns(#"table name 3",{"column name"})
    #"Removed Duplicates" = Table.Distinct(Source)
    #"Removed Duplicates"

So lets look at the M code behind it.
So rather that use a query as a reference, you can declare the table and column that you want with the following

 Table.SelectColumns(#"table name 1",{"column name"})

You wrap these up with a table combine to a SQL Union the columns together using the following


than run a remove duplicates to do a SQL Union All. You don’t have to duplicate the whole tables/queries just to select the data, then trim it down to the columns that you want. As far as I can see you can’t do this in the interface, it only via M.

Power BI – Parameters & Data Sources

This blog post comes from a question raised at a Power BI training session, and deals with updating data source information, for example moving from development to production, or folder or file locations moving. The question was asked as I was showing the attendees the ‘Load from Folder’ function in Power BI, so you can dump monthly files into a folders and load them together. The question was  ‘What if the location changes? do you have to rebuild the report?’. I mentioned that you can edit the M code to change the location, but the users was a bit worried that it was a bit to ‘techy’ to do that, so  I suggested a parameter that you can use. They where more than happy with that approach it  turns out.

So how to use parameters? Well lets fire up Power BI and enter Power Query, but clicking on ‘Edit Queries’

And then select the New Parameter option, which should bring up the following…

Now, there are a few options here, you can have a list of values, or items based on a query, but for this example, we just need the basic version.

Type = Text
Current Values = The folder location that we are going to use to pull the files in

Click OK and how do we use it when loading a folder?

Select the folder import then select the little drop down for the folder path.

Select the parameter (Or start the process from there, by creating a new one).

Click OK, and away it goes as if you typed in the folder location normally.

So if you need to change the location, you don’t have to edit the M code, just update the parameter, by clicking on the manage parameter selection, where we created the new parameter. Nice and easy, well I think so.

So parameters can be used in a number of data sources, for example SQL Server connections

So switching around DEV, UAT and Live servers should be a slightly less painful process, than editing the M code.

Power BI – How to clear all slicers


I was delivering my standard Power BI training course, and there was a question from one of the users which was ‘Is there a way of clearing all the slicers you have selected, so if you are doing something you can go back to the start’, my answer was ‘No… wait, yes… you could use a bookmark to do it’

So how to clear all slicers? First of all in Power BI Desktop you need to go to view and add the Bookmarks Pane. Then make sure that all the slicers are clear, or in the default state that you want them. In this case I made sure that everything was deselected. Once that is done, Click ‘Add’ on the bookmarks pane, that will create the Bookmark, I’ve renamed mine to ‘Default’


Next, add a button to your report, and allocate the ‘Default’ Bookmark to the Action setting. I’ve selected the ‘Reset’ button and added the text ‘Click to reset’ on mine. Next select some items, once you are done and want to go back to the start, click on the button (Ctrl-Click in Power BI Desktop), that will apply the default Bookmark, and in my case clear all the slicers! Nice quick and easy solution.





Your Brain & Report Design Part 2 – Reports and Cognitive Burden

Not done a blog post for a few months, sorry work getting in the way a bit.

In Part One I talked about the Intuitive (System 1) and the Attentive (System 2) processes, and the separation and focus of these mental processes. Here is a quick reminder of the functions of both parts.

System 1 (Intuitive) System 2 (Attentive)
Fast Slow
Unconscious Conscious
Automatic Effortful
Everyday Decisions Complex Decisions
Error Prone Reliable

Cognitive Burden

In an number of texts about report design, the phase Cognitive Burden is used. With the outline of System 1 and 2, cognitive burden is impact of switching between the two functions and making it harder for the System 2 Attentive process to work out what is going on. So in some regards you can use bits of System 1 to help with System 2, and also focus System 2 get to the underlying understanding of the information a bit quicker.

The main outline of this approach is that of simplification… reduce distraction, reduce clutter, remove anything that is not needed, and this is done in a number of ways:

  • Layout
  • Ordering
  • Colours
  • Visualisation Selection

Also to consider will be ‘Framing the Answer’, how best to present a value, with less abstraction of the how you view it, (which sounds trickier than it is) which I’ll cover  in another post. So in this post I’ll be covering the first two, Layout and Ordering.

Layout – Arranging the space

How to leverage System 1 in layout of reports? Well you use some thing that is easy, intuitive and you are doing it right now… reading. The process of reading is from left to right, top to bottom. However this is approach is biased to the English language in my case, so if you are creating reports in Japanese, you’ll be doing it right to left.

When designing reports, I don’t tend to add the logo of the company that it is for. This is purely a personal approach, I don’t see the need to remind the reader of the company that they work for and it also takes up space on the report, that could be best served for information. So if I do have to add a logo or other graphics or even a title, I tend to put them in the top right. Some times if I can get away with it, the logo goes down the bottom right, nicely out of the way.

So in layout terms I would place what you want the viewer to see first on the left side or in the top left, so they get that first information hit, if it is a particularity important value or report that you want them to see first.

Stop and Search - Power BI

In the above case the three card visuals are the first thing I want the viewer to see, then the Map to supply context, then the rest of the charts.

The next technique I use is to layer the report into three sections based on the granularity of the data, so cards would be at the top, high level data, then charts, then table data.

Stop and Search 2 - Power BI

So as the user moves down the report there is a reveal of information, and you can see the break down of the details as you move though the report.

So, in the above reports, you have used the learned System 1 process of reading and used it on a report. Now you can layout a report, next I’ll be talking about the importance of selecting the right visual.


Power Query & finding the Nth instance of a letter in a string for manipulation

Power BI LogoPower Query and the M language is a bit of a pain, not a lot of people do it (Or they do, and they don’t write about it), and the knowledge base on MSDN is not real world example friendly. So trying to figure out how to do something you know how to do in say SQL or DAX is not that easy some times. Where possible it is best to do as much as you can in the Power Query loading, otherwise you’ll be burning CPU time in the data model. So things like data cleansing should be done in Power Query, not in the DAX after it has been loaded into memory.

So find the Nth instance of a item in string I can do in SQL, but in Power Query it is a little bit different. So lets start with an example that contains the following.


So I had to remove the ‘XYZ’ bit. Easy right?

Text.Start("ABC-01-02-03-XYZ", 12)

No, turns out the data contain items like




So no quick fix by just using basic string manipulation. So what can we use? In this case you can use the Text.PositionOfAny function, and with this you can set it to find the first, last or all occurrences. So in this case, looking for the first instance of ‘-‘, you would get:


Text.PositionOfAny("ABC-01-02-03-XYZ", {"-"}, Occurrence.First)

Would return 3. Wait, not 4? No as it starts from ‘0’. It’s like a JSON array or Python dictionary index. This will be important later.

Text.PositionOfAny("ABC-01-02-03-XYZ", {"-"}, Occurrence.Last)

Returns 12.

Text.PositionOfAny("ABC-01-02-03-XYZ", {"-"}, Occurrence.Any)

Would in a custom column, just return ‘List’, or internally {‘3’, ‘6’, ‘9’, ’12’}. You can expand the list out if needed and create extra rows, but you should not in this case.

So to strip out the ‘XYZ’ part of ‘ABC-01-02-03-XYZ’

So you can use Text.Start, so it will read from the start of the string to the end, which can be set from a list.

Text.Start("ABC-01-02-03-XYZ", (Text.PositionOfAny("ABC-01-02-03-XYZ" , {"-"}, Occurrence.All){3}))

So in the above code the {3} is get the value from list in that location. Remember lists start from 0, and {3} would return the value 12 in the list {‘3’, ‘6’, ‘9’, ’12’}

But for the other instances in the data, {3} might not exist. However you can problematically strip stuff out, with for example an ‘if’ clause, with List.Count and drive the extraction of the data, based in the number of items in the list.

So for example in the data I have in the column [SortID] the following



if List.Count(Text.PositionOfAny([SortID] , {"-"}, Occurrence.All)) <= 3
then [SortID]
else Text.Start([SortID], (Text.PositionOfAny([SortID] , {"-"}, Occurrence.All){3}))

So in the 'if' clause List.Count on ABC-01-02-09 will return 3, and then do nothing, as I don't need to trim the end off it. Otherwise it will take AB-012-02-03-XYZ and find the position of the 3rd list value, and take the string length from it found in the Text.PositionOfAny, and trim the end off.

If you want here is some example M code to recreate the above. Enjoy


Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnRy1jUw1DUw0jUw1o2IjFKK1UERtFSKjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [SortID = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"SortID", type text}}),
#"Basic String Manipulation" = Table.AddColumn(#"Changed Type", "Basic String Manipulation", each Text.Start([SortID], 12)),
#"First Occurrance" = Table.AddColumn(#"Basic String Manipulation", "First Occurrance", each Text.PositionOfAny([SortID], {"-"}, Occurrence.First)),
#"Last Occurrance" = Table.AddColumn(#"First Occurrance", "Last Occurrance", each Text.PositionOfAny([SortID], {"-"}, Occurrence.Last)),
#"All Occurance" = Table.AddColumn(#"Last Occurrance", "All Occurrance", each Text.PositionOfAny([SortID], {"-"}, Occurrence.All)),
#"If Clause" = Table.AddColumn(#"All Occurance", "If Clause", each if List.Count(Text.PositionOfAny([SortID] , {"-"},
Occurrence.All)) <= 3
then [SortID]
else Text.Start([SortID], (Text.PositionOfAny([SortID] , {"-"}, Occurrence.All){3})))
#"If Clause"





Power Query – M Calendar Function

Power BI LogoI’ve downloaded a few M language based calendars, for use in Power BI recently and have been frustrated as they don’t quite meet my needs. So I’ve created my own.

The main difference to some I’ve seen online is that the time category columns, Month, Quarter, Year etc. columns are only dependent on the date column, nothing is dependant on anything else, so if you don;t need them you can delete them. The other option, is the use of a key column for sorting. So in Power BI you can use the sort by other column, and stop the month names being sorted alphabetically in charts/slicers and so on. You can define the start if needed and creates the calendar to the current now date.

Also I now have a GitHub!

Code is here enjoy

Or copy and paste from belowRead More »

Power BI Custom Visuals – My top picks part two

If you missed part one of ‘Power BI Custom Visuals – my top picks’, which was basically a love letter to the OK Viz team, check it out now.

So this is the second part of my top picks, looking at the visuals I tend to use when creating Power BI reports. As mentioned in part one, there are about 12 ones that I use, I do tend to be fairly conservative in using custom visuals, there are a quite a number now, and tend to side with functional, and less flashy/gimmicky choices, as those tend to  kick in the attentive process and not be as intuitive to use, and ultimately distract from the information that you wish to show. Anyway here’s the final few.Read More »