Tuesday, June 1, 2021

How to achieve the Power BI shared datasets architecture (Thin Report, Golden Dataset)

   Check list of all posts

This document is a real example from my consulting experience, which is also based on two references below

How to segregate data model from reports using shared dataset

New Power BI Reports from a Golden Dataset


1 Current development and deployment processes

1.1 Development process and problems

Normally, there are two kinds of reports/Dashboards:

Self-serve Power BI

We create generic Generic data model Vx.pbix. Users can then make their own dashboards based on this data model . If with embedded Power BI, System need to clone the report (Power BI API: Reports - Clone Report In Group), and then repoint dataset to Company generic dataset (Power BI API: Reports - Rebind Report In Group.)   The bottom line is that it will end up with a lot datasets to refresh.

Enterprise reports/Dashboards

We have developed all interactive dashboards. End users will directly use these built-in reports. The system behind the scene needs to ensure that the web interface will point to the correct version of the report. 

These dashboards are saved as pbix files, including both visuals and data model. Each pbix file contains data as well once refreshed based on data source. We save all this information locally, as in the illtreated diagram below.


 Difficult to keep models consistent

Dataset in Power BI template represents data model, including tables. Columns, measures, and all relationships behind the scene.  Models are continuing changed, such as 

1. add a new table

2. add a new column

3. add a new measure

4. change existing measure logic

5. add RLS, and more.

Currently, each report has its model; we need to check all models and copy and paste. 

Now, If we want the Report 1 and Report 2 share the same dataset, then we need to enhance the both reports with a lot of copy and pastes. There are also different versions, which will make it very difficult maintain these models consistently.

1.2 Deployment process and problems

If we want to make sure datasets shared, then there would be a lot of complexities with Power BI API.

It is involved in many processes behind the scene, a considerable action when releasing a new version. we need to make sure

1. Deploy the new version

2. Remove old version

3. New version report point to the current shared dataset

4. Make all customer-created dashboards point to the current version of the shared dataset.

2 Solution Overview

We need to change the way we develop Power BI reports by decoupling data from the report.  The data refers to data models, while the report refers to visuals. Instead of building dashboards with both visuals and data models together, we need to develop visuals and data models separately. Instead of deploying dashboards with visuals and data models together, we need to deploy visuals and data models separately. Instead of maintaining report both visuals and data models together, we need to maintain visuals and data models separately. 

We can call this concept as “Thin Report, Golden Dataset”. A dataset refers to a data model. The diagram below illustrates the concept.


 Please note that the golden dataset is actually located remotely, which is a master Power BI Data Model that is loaded to PowerBI.com with or without a report attached. (currently we must have at least a single page in your report, even if it is blank, we can document the dataset if needed).  Once you have your Golden Dataset loaded to Power BI services, you can build as many new ‘thin’ reports as you need in Power BI Desktop by connecting Power BI Desktop directly to Power BI services with Get Data -> Power BI Service.

With this architecture, we can develop and deploy the golden dataset to the Power BI services, but without any report changes. Of cause, we can develop and deploy any reports without any changes of data models.

3 The concept of Power BI dataset

We can implement this concept by using Power BI dataset as source. The golden dataset can be published to workspace as normal.  When add a new report, we can simply point the data source from workspace as screenshot below


 When using Power BI datasets as source, we won’t have data view (s. screenshot below). However, we still can add new measures. We can’t add new columns. Please note that even with new measures, the best practise is to add measures to the Power BI dataset side, or model side.


 We still can switch reports to a different golden dataset too, by clicking transform data, then data source setting.


 Please note that you could add new data into the Power BI dataset below, which is a new release of Power BI data modeling, or Using DirectQuery for Power BI datasets and Azure Analysis Services. (https://docs.microsoft.com/en-us/power-bi/connect-data/desktop-directquery-datasets-azure-analysis-services) This is the most updated data model concept; we can use it for advanced cases, such as add Benchmark data to Company.  


 4 New development and deployment processes with Thin Reports and Golden Dataset

This new concept will change the way as develop the golden data model and develop thin reports. As the model is shared, the consistency will be naturally enforced.

We can change the data model with .pbix file for the golden data model development, as we are doing right now.  The big difference is that we get rid of data model versions in Power BI services. We should not have any versions of both datasets and reports on the Power BI services side or server-side. When promoting a new version of a data model, we should REPLACE the old data model on Power BI services. When upgrading a new version of the report, we should REPLACE the old data report on Power BI services. We should physically only have one copy on the service side, while all data model and report versions on can be saved under source control, such as SharePoint or DevOps.

This new concept will dramatically simplify the current deployment process. Instead of many different steps, described before, we can apply single step, or simply just publish the golden data models, or thin reports. We REPLACE golden datasets, or thin reports when release. If not correctly released, then we can simply rollback by republishing old versions.

5 Implement the shared datasets architecture

Assume that we have one golden dataset called GenericDataset, and two thin reports, called Report1_GenericDataset and Report2_GenericDataset. The below diagram illustrates the architecture, the process of releasing the updated Golden dataset GenericDataset,  and the process of releasing the updated report  Report1_GenericDataset.


6 Implement the shared datasets architecture manually

Setup


Develop/deploy new version of the golden dataset

Check RESULT of new promoted golden dataset

Check RESULT of Impacted thin reports

Develop/Deploy new version of thin reports


 Check RESULT of the thin report Report1_GenericDataset

7 Impact of new Power BI composite models

7.1 Desktop – Internal promotion

7.1.1 Not touch “Transform Data”

Change report as below

R1:  Use shared dataset, no report change.

R2:  Use shared dataset, Add visuals.

R3:  Use shared dataset, upgrade model display with new version.

R4:  Use shared dataset, Change model layout.

R5:  Use shared dataset, Add  new measures, which is only thing we can do.

R6:  Use shared dataset, we can’t change model, but we can do hide/unhide, and expand/collapse




 7.1.2 Touch “Transform Data”

Click “Transform Data”, enter transform concept,  the share dataset is still shared, but new model will be created, and inherited from shared dataset.


 R7:  Use shared dataset, click “Transform Data”, Apply and save, you can see that you can add new column. Then publish.


 R8:  Refresh, and then publish.

R9:  Change model, add new table, and publish.


7.2 Power BI services – Self-serve promotion 

Change report as below

PSR1:  Open CYD, and save as PSR1.

PSR2:  Open CYD, and add new visuals and save as PSR2

With edit model under Power BI workspace, we can’t add any measures, columns, transform data. The only thing we can do is to author reports.


8.  Powershell
we can use Powershell script to publish both datasets and reports, samples are listed below

Install-Module -name MicrosoftPowerBIMgmt 

connect-PowerBIServiceAccount

New-PowerBIWorkspace -Name "WorkspaceName"

New-PowerBIReport -Path C:\Report\GenericDataset.pbix -Name 'GenericDataset' -WorkspaceId <token> -ConflictAction CreateOrOverwrite

New-PowerBIReport -Path C:\Report\Report1_GenericDataset.pbix -Name 'Report1_GenericDataset' -WorkspaceId <token> -ConflictAction CreateOrOverwrite

New-PowerBIReport -Path C:\Report\Report2_GenericDataset.pbix -Name 'Report2_GenericDataset' -WorkspaceId <token> -ConflictAction CreateOrOverwrite

9.  API
we can use API to publish both datasets and reports, samples are listed below

System.Net.ServicePointManager.SecurityProtocol |= SecurityProtocolType.Tls11 | SecurityProtocolType.Tls12;

AccessToken  ="<token>;

client = new PowerBIClient(new TokenCredentials(AccessToken));

client.BaseUri = new Uri("https://api.powerbi.com/");

var file = File.Open(@"C:\Report1\Report1_APIGenericDataset.pbix", FileMode.Open);
            client.Imports.PostImportWithFileInGroup("<token>", file, "Report1_APIGenericDataset",
                        ImportConflictHandlerMode.CreateOrOverwrite);

file.Close();

No comments:

Post a Comment