Monday, February 1, 2021

How to mashup data with DirectQuery for Power BI datasets and Analysis

Context

Microsoft has just released a new functionality called DirectQuery for Power BI datasets and Analysis Services, a fundamental change for Power BI model design, significantly impacting processing and dataflow. We can apply this concept for mushing up data.  https://docs.microsoft.com/en-us/power-bi/connect-data/desktop-directquery-datasets-azure-analysis-services. 

Power BI with import mode is mush up data that can bring all different data into Power BI memory. It provides us all Report with excellent performance. However, this concept could end up a mess, as we usually create silos of .pbix files. It results in many inconsistencies about measures, also puts significant stress on Power BI services capacity.  

Power BI does provide a live connection to analysis services to take advantage of the shared model to resolve such inconsistencies. At the same time, we can also use the shared datasets. However, there are three problems: we can't add any new data and logic, no self-service, need a regression test when there are any model changes.

This new feature has resolved all these problems, and bring a lot of advantages as listed below:

1.Leveraged the existing central model

2.Ensured more consistency

3.Reused existing models instead of re-do

Below is a real sample we have used this concept to resolve a business problem.

Requirement

There are two sets of data: data source 1 is company data, which is imported into Power BI, along with a lot of developed reports. Data source 2 is Global benchmark data, which is much bigger than company data. We can't use import mode, as it is simply too big to bring it into memory. The requirement is to get the data displayed together with the single visual, as illustrated below

  

Identified conformed dimensions and facts

We need to create a model to handle multiple fact tables with conformed dimensions. We have developed GB data model as a well-formed star scheme. Given that we can't and shouldn't change the Company data model, we want to try our best effort to ensure that the Power BI engine generates the most efficient DAX and predictable SQL query to achieve a good performance. Another thing to consider is to create relationships between data from Company and GB. The conformed dimensions should be relatively small for such a mixed model. 

Option 1 – Created consolidated Power BI model 

Based on all identified conformed dimensions and facts, we make a composite model. While the Company model applies the import mode, Global benchmark facts use direct queries. All benchmark facts are retrieved from fact tables designed for industrial reports. The concept is illustrated below

  

We need to pay attention to the following items:

1. The consolidated model refresh time should be about the same as the Company data model, as no additional import data is needed, and no refresh is needed for DirectQuery.

2.There are two design options to use this consolidated dataset:  Treat the new consolidated model as an independent dataset. We will use this model to create Report with Global benchmark data.  Treat the new consolidated model as a shared dataset. We will use the model for all current Report and new dashboard with Global benchmark data

3.The advantage of using a new consolidated model as an independent dataset is that we don't need to touch any developed Report at all. We only publish this new dataset for new Report with Global benchmark data. However, there are a lot of disadvantages listed below: We need to maintain both models, the Company data model, and this new consolidated model. Keep both models in sync is not an easy task. We must refresh the consolidated model, which will put more stress on Power BI capacity.

3.The advantage of using the new consolidated model as a shared dataset is to overcome two drawbacks mentioned above. However, there are disadvantages: We need to perform a regression test whenever we need to change this model.  If only data items are selected from the import mode area, then the Power BI engine should only use memories data without initialing any DirectQuery to the GB database. In other words, we can't segregate data from the Global benchmark database.


Option 2 – Created consolidated Power BI model 

The idea is illustrated below


As displayed, this new implementation will overcome all disadvantages and keep all advantages from tow deployment options of the model above, Listed below:

1.We don't need to refresh the model at all. We apply both data sources with DirectQuery: From the Consolidated model point of view, we use data from Company with DirectQuery. We use data from the Global Benchmark using DirectQuery.

2.We don't need to touch any developed Report at all. We don't need to perform a regression test whenever we need to Consolidated model.  

3.We don't need to worry about keep both models in sync. Whatever Company data model changes, this new consolidated model will automatically reflect the change in real-time.

This new functionality enables us to make the system much more reliable and efficient. 


No comments:

Post a Comment