Abstraction and the problem with percentages

How do you relate to the numbers on your report? Can you visualise what it is trying to tell you? I’m going to suggest that for most of you, it will be a no.

How big does a number have to be so it becomes abstract? One thousand, one million, more? We’ll it is a lot lower that you think. Lets try a simple exercise to show you what I mean. Close your eyes and image a black background, and then a a white dot. (You can use whatever colours you prefer). That dot is now sitting in your imagination, now try adding one more dot at a time, and try to figure out at what point your ability to hold those dots in your imagination stops.

It should be around 15 dots, so that shows you what one version of abstraction is, there are other types. In the real world using your normal vision at what point can you look at a number of items, for example boxes before you stop to stop and count them. It is about 5 to 6.

Did you count or do you see 3 chairs?

So there are limits to our cognitive process to visualise and count numbers. As mentioned in my other blog posts, for example Your Brain & Report Design Part 1 – The Attentive vs the Intuitive Process, which goes through the how you have two path ways, the intuitive and the attentive process, looking at some items and being able to see the number and not count it, would come under the intuitive process (Don’t forget it can be error prone). Where as stopping and counting the items would be under the attentive process.

One good example of the movement of the attentive process to the intuitive process, is the understanding and recognition of dice patterns. Do you count the dots any more or just see the pattern? If you have very young children watch them as they grow up, to see the point they stop counting the dots on the dice and start seeing the pattern.

Reducing Abstraction

So that is one side of abstraction, our limit to visualise objects and quickly count things. The other side is the ability to relate to them, to turn them from an abstract concept to a number that has meaning to us. The number one offender is the percentage, but why?

In the book ‘Thinking, fast and slow’ by Daniel Kahneman, he shows the impact of something called frequency format and how it affects how we relate to a number. For example:

  • 5%

It is nice simple low value, it is 5 out of 100. But as been mentioned above, we would find it hard to imagine 100 items, 5 should not be an issue. We just see the small number. Let’s change it to something more relatable:

  • 1 in 20

Which do you find that you relate to more? The percentage or the ratio?

There is some body of evidence that when parts manufacturers, moved from expressing part failure rates as percentages, to a ratio, there was a lot more attention to the issues in the manufacturing process to drive those numbers down. But why?

Materialising the number as a ratio, moves it from an abstract concept of ‘how likely’ to the intuitive ‘how may’. You see this in adverts to make things personal to you. Does the poster say 16% of men may get prostate cancer? No it will say 1 in 6 men in their lifetime.

So moving from a ‘how likely’ to ‘how many’ number enables the person to understand, relate and visualise a lot more easily than an abstract percentage. However it can have its draw backs.

As part of the impact into frequency format, psychologists conducted a survey, were the question was asked which is the more dangerous disease:

1 – A disease that kills 1,286 people out of 10,000

2 – A disease that kills 24.91% of the population

The majority of people answered 1, the one that kills 1,286 people is the more deadly. If you stop to work it out, it is about half as deadly at around 12%. So what happened? People found it easier to relate to the number and make it personal.

Cat Food & Perfect Ratios

Mr Schrodinger, I think the box is too small!

If you are of a certain age and from the UK, you may recall the adverts for Whisker Cat Food, which had the marketing line of ‘8 out of 10 owners said their cats preferred it’.

Lets break it down with three examples, that all have the same outcome:

  • 8 out of 10 owners said their cats preferred it
  • 4 out of 5 owners said their cats preferred it
  • 80% owners said their cats preferred it

So we have expressed it in three ways, two ratios and a percentage. We can ignore the percentage, we have already discounted the percentage from our ability to relate to it. Which leaves us with two last options:

  • 8 out of 10 owners said their cats preferred it
  • 4 out of 5 owners said their cats preferred it

Which do you more relate to? Is there a good ratio to use? Yes, the power of 10’s. Use were you can ratios in multiples of 10s. Also round up numbers where you can. For example:

  • 3.75 in 15

3.75 is 1/4 quarter of 15, but it should be expressed as:

  • 3 in 10

Where we have rounded up 2.5 to 3. It makes a bit more sense doesn’t it!

Framing the number – Positive or Negative?

This is to make you think about the context of how you present the result. So, lets say you go to the doctor, and you have to have an operation, or an injection or something that will may have some sort of an impact on you. Does the doctor say:

  • 9 out of 10 people have no complications after this surgery
  • 1 in 10 people have complications after this surgery

Which would you prefer to hear? So framing is presenting the value in a way to make it a bit more or less personal. So for the example of manufacturing defects, would you use:

  • 9 in 10 success rate
  • 1 in 10 failure rate

Which should be focusing on? Well in this use case the 1 in 10 failure, that makes it more relatable to the situation.

As mentioned on abstraction and relating the numbers to you, if the Doctor is really good they’ll say ‘90% of people have no complications after surgery’

When to use a percentage?

As every good IT consultant will say, ‘It depends’. Something to consider:

5% Cash back on purchases sounds a lot better than 5p back in every £1 spent! You know it is a small amount , and easy to compare across other offers. The higher the percentage the better the deal. For cloud services 99.999% uptime, sounds a lot better than 0.9 seconds downtime per day. The more/less decimals the better/worse the service level. So when measuring your service level you are comparing the difference, not ‘how many’, you can see if it has reached the acceptable agreed service level or not.

Other uses of percentages

Sometime around 2014, there was survey of 10,000 secondary school children. They were asked what drugs they had tried. A few year later they did the same survey, again asking the same questions. The media was then full of news items like, ‘100% increase in cocaine usage in school children’, and ‘A resent survey has shown that drugs usage has doubled in a few years’ and so on. Looking at the data, it showed that it had increased 1 student, to 2. So technically it was correct. However not quite the sensation the newspapers would lead you to be.

In ‘Thinking, fast and slow’ it talks about that there is some evidence that in the US legal system, defense or prosecution lawyers will use one or the other.

  • Defense – DNA evidence in capital cases is wrong in 1 in 1,000 cases
  • Prosecution – DNA is right in over 0.01% of cases

You can now see what they are doing to two to relate, or disassociate the numbers to the jury.

Rounding up

You should have some insights on abstract numbers, and how by using ratios, you can move numbers to something a bit more intuitive and a lot more relatable. I always say when I present talks about subject like this, that for the intuitive and the attentive process, one is not better than the other. We just want to help people process the information quicker, and I will say it for ratios and percentages, one is not better than the other, just consider it for your use case.

References

Cat Image: https://unsplash.com/photos/mrTydVjg04o

Cashback: https://www.vectorstock.com/royalty-free-vector/cash-back-5-percent-money-refound-concept-badge-vector-28061818

Chair Picture: Jon Lunn (Me)

Thinking, Fast and Slow – Daniel Kahneman – Chapter ’Rare Events’

Data Culture – First steps in creating it in your organisation

These are the seventeen reasons why IT will not let me send my 250MB Excel file that we use to run the business via email

At Ricoh Digital Services (AKA the day job) we do a lot of Data & Analytics projects for a diverse range of customers. When the project is completed and handed over, we can let the customer run it, or we can support it. What we don’t get involved with is the changes that it can bring to an organisation. Sure, during the sales process, we talk about the single source of the truth, the outcomes, what the customer wants, but not how they use data internally. We assume that the Data Warehouse is going to be used, but what we don’t do is see, is if the organisation is using it effectively, we assume that it will be. However, what we see is a number of organisations reacting to information, rather that being lead by it. So here are some ideas I’ve gathered on turning that around, and moving from a reactive driven to a more proactive data driven and some of the benefits that it should bring.

The Problem

The hand loom circa 1850

The above image is a picture of a hand loom, weaving thread into cloth by hand, a laborious time intensive manual process, prone to mistakes. Replace that hand loom with Excel, and that is a chuck of your organisation today. That is the problem, there is normally a small cottage industry of Excel users creating reports, running teams, departments, entire companies, and COVID-19 track and trace databases using Excel. In the wrong, and even the right hands, Excel can be dangerous. Did you know that Excel played its part in a $6 Billion trading loss at JP Morgan as one of the copied and pasted quantitative models had some basic Excel issues, as noted in the internal investigation:

Read More »

What is Power BI Premium Per User, and what is it for?

I love Power BI

On the 23rd of September at the Microsoft Ignite conference, Microsoft announced changes to Power BI Premium, and introduced a Power BI Premium Per User license. This sounded a bit strange, as the aim of Power BI Premium was to move from the license per user model to deploy reports across an organisation.

So what is the aim of this Premium Per User and how will it affect users and customers?

With standard Power BI Premium (Which I’ll call Premium Capacity to avoid confusion), you get a dedicated piece of Power BI hardware, starting at 8 Core and 25GB of memory. It also removes the need for Power BI Pro licenses to consume shared reports. Also, with Premium Capacity you had some advantages in the capability and the features compared to Pro.

· Data model size increased from 1GB to 10GB
· Paginated Reports
· Advanced Dataflows allowing Direct Query on those datasets
· Increase from 8 dataset refreshes per day to 48
· AI features such as AutoML and Cognitive Services
· XMLA Endpoints
· Power BI Report Server — On Premise version of Power BI

The main barrier to entry for the uptake of Premium Capacity was the price, which starts at £3700 per month, which is the same price as about 500 Power BI Pro licenses. So, if you had more than 1000 users that needed access to reports, you get Premium Capacity for a lower cost than Pro and have some features that medium to large organisations needed. So Premium Per User is a middle ground between the full Premium Capacity and Power BI Pro and should suit certain organisation sizes and workloads. You don’t get the dedicated hardware of Premium Capacity, but you do get an upper limit on the model sizes. The new Per User license comes with 100GB of Memory, which is the same as a Premium Capacity P3 node, which starts at £15,000 per month. So it is shared capacity like Power BI Pro, but with defined limits.

Premium Per User unlocks some features that the user community have been asking for, for example, higher dataset/dataflow refresh rates and enhanced automatic page refresh. You also get the benefit of advanced data flow features where you have a dataflow which you can direct query and not have to import data from it.

Another thing the Power BI User community has been asking for is paginated reports in the service, so it is nice to see that this is now possible with the Premium Per User, this should help some customers move from two reporting platforms, SQL Server Reporting Services and Power BI, into a single user experience.

So what are the restrictions? Well in terms of sharing reports, it seems that Pro license users can share workspaces to Pro and Premium Per User, but Premium Per User can only share to Premium Per User. However, you can set up a subscription of a Premium Per User report that can be consumed by a Pro user. Both Pro and Premium Per User can’t share to free users, for that you’ll still need Premium Capacity.

Premium Per User will be in private preview in November, and pricing is yet to be announced, but comments by the program manager says it will be a ‘fair’ price. It is nice to see that Microsoft have listened to the user community and that the features of the Premium Per User will help some user workloads and make Power BI more attractive to smaller customers.

PBIDS – Power BI Data Sources Files

The great thing about Power BI is the monthly updates and how much Microsoft are churning out features, updates, and all sorts of cool stuff. The worst thing about Power BI, is trying to keep up with all of it. In this post I’m going to look at Power BI Data Source files. PBIDS was announced back in October 2019, I must have been busy, but I didn’t pick up on it. But the other day I was using Azure Synapse, and its integration with Power BI on a project. It download a blank connection file, and noticed it had different icon and file extension. So have to figure out what is was.

Here is the announcement from Oct 2019, for those like me that missed it.

So what is a PBIDS file? Well it creates a basic file that holds a data source connection. At the moment you can only store one connection and not multiple connections. I looked at the posting and the documentation for it, but I didn’t quite get it at first, as I could not see how to define and save the file, just bits of JSON to set a connection. But that bit of JSON is all it is. It just holds the connection settings, so it is not a full fat PBIX Power BI file. It supports the following connection types:
Azure Analysis Service
Folder
OData
SAP BW
JSON
SAP Hana
SharePoint List
SQL Server
Text file
Web
Dataflow

Here is the basic JSON for a connection to a folder:

{
“version”: “0.1”,
“connections”: [
{
“details”: {
“protocol”: “folder”,
“address”: {
"path": "C:\\Users\\jon\\Documents\\Some Folder"
}
}
}
]

That is all there is to it. Create the JSON in notepad, save it as with the extension of ‘.pbids’ and there you have it. The Power BI Tips site has a connections builder that you can use that will generate the PBIDS file for you if you want.

If you load it, it fires up Power BI, and starts the ‘Get Data’ process for you.

Power BI PBIDS file opening

So you can save a set of predefined data connections and store them in the for users to quickly get access to data. It will not support storing usernames and passwords, as that would not be secure and just give access to anyone who has the file. I the moment it is at version 0.1, so there maybe some updates coming in the future, for example adding multiple connections. But it is a nice quick way to save and store connection templates.

Animate flight data in Power BI

“Have you seen the deck.gl framework from Uber? It’s designed to show large scale big data visualization?” said one of the software engineering team. I had a look at it, and it does look amazing. The below image is from the deck.gl framework, and shows car accidents across the UK, or at least in the screen grab, the south east and London.

Uber deck.gl – Visual amazeballs

We went through some of the examples, and while looking at the flight data one, they mentioned that ‘You can’t do that in Power BI’.

Heathrow flights – deck.gl example

I retorted, ‘You can do something a bit similar’ I then had to go prove I could do it. To be fair they had said that a few times while going through the examples, I had to pick one I knew I could do. There was no way you let the front end peeps lord it over us data geeks, what sort of a world would that be!

So first of all, I had to get some data. Thankfully the lovely people at Uber data visualisation, had provided the source of the data. One quick hyperlink later and I was downloading flight path data from the website The OpenSky Network. It’s a service that collects data from a network of stations that read the ADS-B and MODE-S signals transmitted from the aircraft for air traffic control.

It’s a big data set, it is split by day, and then by hour, and as it can be downloaded in CSV format, so I loaded the one file for the 07:00 to 07:59 data, that was about 200 mb.

Nice and easy so far, its a very easy formatted data set (check the readme.txt on the site), apart from the time columns. They are in the Unix format. For those not familiar with it, it is not a date time, it is the number of seconds that have passed since 01/01/1970. So 1574349827 is 11/21/2019 @ 3:23pm (UTC). That isn’t an issue with a bit of M we can convert it on load in Power BI to a nicely formatted date time format.

#datetime(1970, 1, 1, 0, 0, 0) + #duration(0, 0, 0, [time])

Once that was done, I mapped the points on the map visual which gave me the following output:

Aircraft Points

One problem, there was to many points, so I’ve filtered the data on the Latitude and Longitude columns just to show a range for the UK. Looks good, but we need to animate it. Does Power BI do animation? No… well not by default, but there is a custom visual called Play Axis that can enable you to do it.

Play Axis!

This allows you to add a column to it and play it in order, in this case my date time. In fact I didn’t need to convert the Unix time, as it would have played it back in numerical order! Once that was added to the report we have animated visual awesomeness!

It’s alive!

I’ve loaded the sample file on my GitHub here enjoy!

Your Brain & Report Design Part 3 – 10% or 1 in 10?

image11This is Part 3 in my series of blogs posts (fairly infrequent), about report design options, based around the cognitive process. This post I’ll be looking at framing the answer. What is framing the answer? It is how best to show the result in the context of the question, and in some cases influence the user to taking action on it.

Framing the answer

For example have a look at the following.

Framing_1

Both are showing the same thing, just from a difference perspective. One great example of how to frame the answer is how a doctor would talk to you about an operation:

Framing_2

Framing the result is leading the user through a positive or negative emotional context. ‘We missed the target by 3 %’ or ‘We achieved 97% of the target’ have different contexts around them. I would suggest that the second statement is perceived as more positive as the first one. So the first step is to define that context in the data you want to display.

Is it abstract

The next thing to consider, is the value abstract? Can the user identify or associate with the value? For example:

Framing_3

Again both show the same thing, but at a cognitive level the Failure Rate 10% is an abstract number, hard to visualize, but 1 in 10 isn’t. Some parts manufacturers moved from showing percentage values to ‘in something’ values as it focused the report consumer more directly. As a result, there was a lot more attention to the failure rates, which then lead to action, and a reduction in the part failure rate.

In the book Thinking, Fast and Slow, by Daniel Kahneman (He is a physiologist who won a Noble Prize in economics) he talks about this affect. Also he indicates that prosecution lawyers will use one or the other the try to convince the jury or parole board. ‘This person has a one in ten chance of re-offending’ or ‘This person has a ten percent chance of re-offending’. Subtle differences, but priming the jury to an outcome that they want.

So when next creating a report, think about how to frame the result, and how best to display the result. I’ve used it recently in sickness rates, moving from a percentage to a proportion of the workforce, with positive results.

The Prime Number

One more thing about numbers, and this is something that you can try yourself. You can prime people with a value to affect their next response.

Ask one set of people the following:

Have you seen the film 80 Days around the world?

The next set:

Have you seen the film, Snow White and the Seven Dwarfs?

The next question you ask to both groups:

How many countries are in Africa?

You, hopefully, should get higher estimated values from the first set, and lower estimates for the second set. Why? You have already primed them with a value, 80 for the first set and 7 for the second. basically you have pass a unconscious stimulus to them to provoke a response from a later unrelated question. I’ve done and number of these tests, for example, I asked one group to give me an estimate of a price of a laptop. One group had in the question, do not give a value below £800, the other do not give a value above £1500. A final group didn’t have any mention of a estimate limit. For the results of the questions that had a ceiling or floor on the estimate, the responses were in a small range. For the group ‘do not give a value below £800, most where in  a small range around  £1100. For the group ‘do not give a value above £1500’ they where a small range of £1300. For the group with out a ‘prime’ the ranges were quite wide from £600 to £2000. It is an interesting example of how you can influence people with out them knowing!

I highly recommend the book Thinking, Fast and Slow, it shows how you understand and perceive a numbers. It is focused on estimating and risk, but it does go through a wide range of topics, that are eye opening in how your brain works.

Azure Databricks and CosmosDB: Tips on playing nice together

DBrick love Cosmos

I’ve been working on a project and been using CosmosDB as a data source and extracting the data from it into Azure Databricks. Once there I apply some data and analytics magic to it. So this post is about some of the assumptions that I had about it, and some of the issues that I came across.

For setting up Databricks to get data from CosmosDB, the place to go is the Azure CosmosDB Spark connector site. I’m not going to go through installing it, as the read me and guidance on the GitHub does a good job and it is straight forward to do. There is also a tutorial here, that deals with ‘On time flight performance’, it covers how to connect, and then process data but doesn’t cover any of the issues you may have, as its using a nice, conformed data structure. In the real world, we have all sorts of weird design options, taken over the life time of a product and/or service.Read More »

Theme ggplot2 to Power BIs visual style

For a recent project, I’ve had to hit the R scripting and use the R Visuals to plug a gap in Power BI (PBI). Even though PBI is very capable, it does not have the full range of statistical formulas that Excel has. So I’ve had to build Linear Regression formulas in DAX, and also calculate in Power Query some coefficients using R. For the visuals, again I hit some limits with Power BI. I needed to use a measure as an axis, I also needed to show a polynomial trend line, so had to use the R visual and the ggplot2 library to display the data.

I’ve not used R much, I’ve been on a SQL Bits training day about it, and one of my colleagues, was quite good at it (However they have since moved on), so it was a nice move out of my comfort zone and I get to learn something too!

Note: If you want to follow this blog post you’ll need Microsoft R Open and install the ggplot2 library. Also useful is the ggplot2 reference web site

In this example I’ve started with a blank PBI file, and used the ‘Enter Data’ function to create a column with single value in it. You don’t need to do this if you are using you own data, I just need something to drag into the R visual to use as a data set. I’ll actually be using one of the inbuilt R data sets for the visual. You can download the example PBI file from my GitHub

So lets start with the basic set up in the R Visual.

library(ggplot2)

#Base Chart uses the iris dataset
chart <-    ggplot(iris, aes(x = Petal.Width, y = Sepal.Length)) + 
            geom_point() + 
            stat_smooth()      

#display
chart

Which renders the following chart. (Note the rounded corners in the visual, thanks to the Feb PBI Desktop update)

Lets break this down. First it calls ggplot, using the ‘iris’ data set and assigns the Petal.Width and Sepal.Length to the relevant axis.

ggplot(iris, aes(x = Petal.Width, y = Sepal.Length))

Plots the points on the chart, don’t miss this bit out, I did, and could not understand why I wasn’t seeing any data

geom_point()

Add the trend line and the shading.

stat_smooth()

So far so good. But it does not fit the Power BI style, and will look a bit out of place along side the base PBI ones. However we can fix that. First we are going to add two variables called BaseColour and LightGrey and assign them a hex colour value so they can be used without having to recall the hex values. So the base code will look like:

library(ggplot2)

#BaseGrey
BaseColour = "#777777"
LightGrey = "#01B8AA"

#Base Chart
chart <-    ggplot(iris, aes(x = Petal.Width, y = Sepal.Length)) + 
            geom_point() + 
            stat_smooth()      

#display
chart

Nothing should change in the visual. So next I’m going to update the stat_smooth function, to remove the shaded area and change the colour of the line.

stat_smooth(col = LightGrey, se=FALSE)

‘col’ assigns the colour, ‘se’ removes the shaded area.

For the next set of updates to the visual, we are going to update the theme setting, specifically the following:

  • Axis Text
  • Axis Ticks
  • Panel Grid
  • Panel Background

So lets start with the text. if you poke around the PBI formatting settings you’ll come across the default font family, size and colour in the standard PBI visuals. So next we are going to add the following:

First set the ‘axis.text’ elements to the size, font and use the variable ‘BaseColour’

axis.text = element_text(size=11, family=”Segoe UI”, colour=BaseColour)

remove the axis ticks with

axis.ticks = element_blank()

and set the axis text, setting the colour using the BaseColour variable

axis.title = element_text(size=11, family=”Segoe UI”, colour=BaseColour)

but will need to wrap it up in the ‘theme’ stuff, so the code now looks like:

library(ggplot2)

#BaseGrey
BaseColour = "#777777"
LightGrey = "#01B8AA"

chart <-    ggplot(iris, aes(x = Petal.Width, y = Sepal.Length)) + 
            geom_point() + 
            stat_smooth(col = LightGrey, se=FALSE) 

#Build theme
chart <- chart + theme(      axis.text = element_text(size=11, family="Segoe UI", colour=BaseColour)
                         ,   axis.ticks = element_blank()
                         ,   axis.title = element_text(size=11, family="Segoe UI", colour=BaseColour)     
                        )

#display
chart

Which hopefully should be getting close to the PBI look:

So just the grid line and backdrop to sort out, which will be added to the theme set up as:

Set the grid for the ‘y’ axis.

panel.grid.major.y = element_line( size=.1, color=BaseColour )

Set the grid for the ‘x’ axis, basically get rid of it using the element_blank()

panel.grid.major.x = element_blank()

And set the backdrop to blank as well

panel.background = element_blank()

which should mean that the code should now look like:

library(ggplot2)

#BaseGrey
BaseColour = "#777777"
LightGrey = "#01B8AA"

chart <-    ggplot(iris, aes(x = Petal.Width, y = Sepal.Length)) + 
            geom_point() + 
            stat_smooth(col = LightGrey, se=FALSE) 

#Build theme
chart <- chart + theme(      axis.text = element_text(size=11, family="Segoe UI", colour=BaseColour)
                         ,   axis.ticks = element_blank()
                         ,   axis.title = element_text(size=11, family="Segoe UI", colour=BaseColour)     
                         ,   panel.grid.major.y = element_line( size=.1, color=BaseColour )  
                         ,   panel.grid.major.x = element_blank()   
                         ,   panel.background = element_blank()  
                        )

#display
chart

And it should match the look of the default Power BI style.

So with a few bits of code added to the theme, you can change the look of an R visual, to match the default PBI theme, or what every you want it to look like. I’m not a R expert in any measure, and there maybe a better way of doing this, but its got me started, and hopefully you too.

Update: The BBC released a theme for R, along with a cook book on how to do visuals, I may use this as a base for my next project.

Power Query – Combining Columns in M

In my Power BI courses I always recommend some books and sites that will help them learn DAX or M. Most of the time i’m presenting to new users to Power BI, and having to get them to move from the Excel muscle memory that they have and also to show them that sometimes the code-less approach is not always the best way.

One example presented itself the other day. A user was combining some Excel sheets and wanted to create a rough dimension based on the values in one of the columns, then use it as a filter or axis. They hit the internet and came across a few post on how to do this. A few recommend this approach:

1 – Append the two tables together as new query
2 – Remove the columns you don’t need
3 – Remove duplicates

Well that is sort of fine, as you can’t specify a single column in the table append, and it can all be done in the interface. However is has a bit of overhead in the 1 and 2 stages, so can you do it so you only reference the relevant columns and remove some of the overhead. The answer is ‘Yes’, but you need to code it in M. Using the Blank Query function i created this M code:

let
   Source = Table.Combine(
                            {
                                Table.SelectColumns(#"table name 1",{"column name"})
                            ,   Table.SelectColumns(#"table name 2",{"column name"})
                            ,   Table.SelectColumns(#"table name 3",{"column name"})
                            }
                        ),
    #"Removed Duplicates" = Table.Distinct(Source)
in
    #"Removed Duplicates"

So lets look at the M code behind it.
So rather that use a query as a reference, you can declare the table and column that you want with the following

 Table.SelectColumns(#"table name 1",{"column name"})

You wrap these up with a table combine to a SQL Union the columns together using the following

Table.Combine

than run a remove duplicates to do a SQL Union All. You don’t have to duplicate the whole tables/queries just to select the data, then trim it down to the columns that you want. As far as I can see you can’t do this in the interface, it only via M.

Power BI – Parameters & Data Sources

This blog post comes from a question raised at a Power BI training session, and deals with updating data source information, for example moving from development to production, or folder or file locations moving. The question was asked as I was showing the attendees the ‘Load from Folder’ function in Power BI, so you can dump monthly files into a folders and load them together. The question was  ‘What if the location changes? do you have to rebuild the report?’. I mentioned that you can edit the M code to change the location, but the users was a bit worried that it was a bit to ‘techy’ to do that, so  I suggested a parameter that you can use. They where more than happy with that approach it  turns out.

So how to use parameters? Well lets fire up Power BI and enter Power Query, but clicking on ‘Edit Queries’

And then select the New Parameter option, which should bring up the following…

Now, there are a few options here, you can have a list of values, or items based on a query, but for this example, we just need the basic version.

Type = Text
Current Values = The folder location that we are going to use to pull the files in

Click OK and how do we use it when loading a folder?

Select the folder import then select the little drop down for the folder path.

Select the parameter (Or start the process from there, by creating a new one).

Click OK, and away it goes as if you typed in the folder location normally.

So if you need to change the location, you don’t have to edit the M code, just update the parameter, by clicking on the manage parameter selection, where we created the new parameter. Nice and easy, well I think so.

So parameters can be used in a number of data sources, for example SQL Server connections

So switching around DEV, UAT and Live servers should be a slightly less painful process, than editing the M code.