Wednesday, September 1, 2021

How to filter for the most recent version of each day in Power query, like SQL: ROW_NUMBER() OVER(PARTITION BY ORDER BY)

   Check list of all posts

Please see the sample below; We need to load only files from a folder with the latest version for each day.  In this case, there are two files for day 20210624, or File_20210624.csv and File_20210624_v1.0.csv; we only need to load File_20210624_v1.0.csv, while File_20210624.csv is filtered oout. Similarly, there are two files for day 20210630, or File_20210630_v1.0.csv and File_20210630_v2.0.csv; we only need to load File_20210630_v2.0.csv, while File_20210630_v1.0.csv is filtered out. 

We can filter for the most recent version in Power BI query with 3 options

Option 1: Use the latest modified timestamp

If we can be sure that the modified timestamp is unique and the latest version is with the latest modified timestamp, we can use this last-modified timestamp to filter files from the folder.

Step 1: Extract Date as new column File

= Table.AddColumn(#"Filtered Rows1", "File", each Text.Range([Name],5,8), Int64.Type)

Step 2: Group by this new column












Change M script as below

= Table.Group(#"Added Custom", {"File"}, {{"Count", each Table.RowCount(_), Int64.Type}, ......

= Table.Group(#"Added Custom", {"File"}, {{"LastDateModified", each List.Max([Date modified]), type datetime}, ......








Step 3: expand table with modify timestamp and then filter

= Table.SelectRows(#"Expanded All", each ([LastDateModified] = [Date modified])) 

Option 2: Use file name with top version

We can also use the file name to derive the version to determine the latest version for each day and then filter. It is the same as the first option, but we use the file name instead.

= Table.Group(#"Added Custom", {"File"}, {{"LastFile", each List.Max([Name]), type text}, ....

= Table.SelectRows(#"Expanded All", each ([LastFile] = [Name]))


Option 3: Use list to filter main query

Suppose the current implementation is complicated in Power queries. In that case, we may need to turn the file names with the latest version as a list and then add a single step to filter the existing main Power BI query; this solution will make the logic much more straightforward. In addition, it will be easy to debug and maintain.











= Table.SelectRows(Source, each (List.Contains (FilesTobeLoaded,[Name] ) 

No comments:

Post a Comment