Wednesday, February 1, 2023

What are better ways to combine files in Power BI ?

                                                                                                                  Check list of all posts

1.  Connect to folders and setup Gateway

1.1 Use Folder with Folder.Files, sample below

Source = Folder.Files(FolderParameter)

1.2 Use SharePoint Folder with SharePoint.Files, sample below

Source=SharePoint.Files("https://ABC.sharepoint.com/sites/Folder/", [ApiVersion = 15])

1.3 Use the Folder to get a single file. The sample ( if the file is excel ) below

 Source=Excel.Workbook(SharePoint.Files("https://ABC.sharepoint.com/sites/Folder/", [ApiVersion = 15]){[Name="Sample1.xlsx",#"Folder Path"="https://ABC.sharepoint.com/sites/Folder/Subfolder1/Subfolder2/"]}[Content]),

First, we need to validate if the gateway works before doing development depth. Note that if gateway folder doesn't need to be the same Folder as required. Instead, if the gateway folder is the parent Folder, all subfolders will be supported. Furthermore, you don't need to have a SharePoint gateway, but you need to have a generic user credential to access SharePoint.


2. Identify files

As many files are listed, you must add a subfolder to limit all files. Furthermore, you probably need modifiy logic to filter out all files needed with the cases below:

2.1. Only the latest version from the same day is needed

2.2. Only for past N period data are needed, therefore to limit data loaded into the dataset

2.3. Only specific dates data are needed based on the same kinds of logic with another table filer; you probably need to use Merge join to filter out all files

2.4. If the file is enormous, it can use incremental loading with RangeStart and RangeEnd. Believe or not, we can do an incremental refresh with files, the same as the database, but not in real-time.


3. Retrieve data from files, along with overall columns

Default Solution: We expend the Content binary column with Invoke Custom Function1. The problem with this solution is that we will have many fx {Transform File} functions, and it is difficult to maintain and have many steps in many queries. Image that how complicated could it be if you have many files to load.  

Newly proposed solution: The solution is to add a new column ( called as  File column) as below

Csv.Document([Content]) or 

Excel.Workbook([Content])

Normally, we can Add overall parameters, such as Snapshot Date at this step.


4. Extract data

Default solution: expend table, then promote header, rename columns as needed

The newly proposed solution is to expand the table only and then remove the header. Please note that the code below can be generated by a map if there are many columns, 

Table.ExpandTableColumn(#"Last Step", "File", 

{"Column1", "Column2", "Column3"}, 

{"RealcolumnName1", "RealcolumnName2", "RealcolumnName3"})

Then remove headers

Table.SelectRows(#"Expanded File", each ([RealcolumnName1] <> "{HeaderColumn1}"))

this solution gives us an opportunity to well organize data maps from technical specifications.


Overall, this solution will make the query well structured as a single query if we want to combine all steps, making the flow much easier to debug. 




No comments:

Post a Comment