Power Query & finding the Nth instance of a letter in a string for manipulation

Power BI LogoPower Query and the M language is a bit of a pain, not a lot of people do it (Or they do, and they don’t write about it), and the knowledge base on MSDN is not real world example friendly. So trying to figure out how to do something you know how to do in say SQL or DAX is not that easy some times. Where possible it is best to do as much as you can in the Power Query loading, otherwise you’ll be burning CPU time in the data model. So things like data cleansing should be done in Power Query, not in the DAX after it has been loaded into memory.

So find the Nth instance of a item in string I can do in SQL, but in Power Query it is a little bit different. So lets start with an example that contains the following.

ABC-01-02-03-XYZ

So I had to remove the ‘XYZ’ bit. Easy right?


Text.Start("ABC-01-02-03-XYZ", 12)

No, turns out the data contain items like

AB-012-02-03-XYZ

ABC-01-02-09

AB-012-032-03-XGYZ

So no quick fix by just using basic string manipulation. So what can we use? In this case you can use the Text.PositionOfAny function, and with this you can set it to find the first, last or all occurrences. So in this case, looking for the first instance of ‘-‘, you would get:

 


Text.PositionOfAny("ABC-01-02-03-XYZ", {"-"}, Occurrence.First)

Would return 3. Wait, not 4? No as it starts from ‘0’. It’s like a JSON array or Python dictionary index. This will be important later.


Text.PositionOfAny("ABC-01-02-03-XYZ", {"-"}, Occurrence.Last)

Returns 12.


Text.PositionOfAny("ABC-01-02-03-XYZ", {"-"}, Occurrence.Any)

Would in a custom column, just return ‘List’, or internally {‘3’, ‘6’, ‘9’, ’12’}. You can expand the list out if needed and create extra rows, but you should not in this case.

So to strip out the ‘XYZ’ part of ‘ABC-01-02-03-XYZ’

So you can use Text.Start, so it will read from the start of the string to the end, which can be set from a list.


Text.Start("ABC-01-02-03-XYZ", (Text.PositionOfAny("ABC-01-02-03-XYZ" , {"-"}, Occurrence.All){3}))

So in the above code the {3} is get the value from list in that location. Remember lists start from 0, and {3} would return the value 12 in the list {‘3’, ‘6’, ‘9’, ’12’}

But for the other instances in the data, {3} might not exist. However you can problematically strip stuff out, with for example an ‘if’ clause, with List.Count and drive the extraction of the data, based in the number of items in the list.

So for example in the data I have in the column [SortID] the following

AB-012-02-03-XYZ

ABC-01-02-09

if List.Count(Text.PositionOfAny([SortID] , {"-"}, Occurrence.All)) <= 3
then [SortID]
else Text.Start([SortID], (Text.PositionOfAny([SortID] , {"-"}, Occurrence.All){3}))

So in the 'if' clause List.Count on ABC-01-02-09 will return 3, and then do nothing, as I don't need to trim the end off it. Otherwise it will take AB-012-02-03-XYZ and find the position of the 3rd list value, and take the string length from it found in the Text.PositionOfAny, and trim the end off.

If you want here is some example M code to recreate the above. Enjoy

 

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcnRy1jUw1DUw0jUw1o2IjFKK1UERtFSKjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [SortID = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"SortID", type text}}),
#"Basic String Manipulation" = Table.AddColumn(#"Changed Type", "Basic String Manipulation", each Text.Start([SortID], 12)),
#"First Occurrance" = Table.AddColumn(#"Basic String Manipulation", "First Occurrance", each Text.PositionOfAny([SortID], {"-"}, Occurrence.First)),
#"Last Occurrance" = Table.AddColumn(#"First Occurrance", "Last Occurrance", each Text.PositionOfAny([SortID], {"-"}, Occurrence.Last)),
#"All Occurance" = Table.AddColumn(#"Last Occurrance", "All Occurrance", each Text.PositionOfAny([SortID], {"-"}, Occurrence.All)),
#"If Clause" = Table.AddColumn(#"All Occurance", "If Clause", each if List.Count(Text.PositionOfAny([SortID] , {"-"},
Occurrence.All)) <= 3
then [SortID]
else Text.Start([SortID], (Text.PositionOfAny([SortID] , {"-"}, Occurrence.All){3})))
in
#"If Clause"