Saturday, May 1, 2021

How to achieve a Power BI excellent performance with a practical project

  Check list of all posts

Performance problem

We need to develop Power BI reports based on a big data warehouse Azure SQL database, about 200GB. Due to the limited Power BI capacity A3, we have decided not to use import mode, as it is far too big to import. Instead, we have applied DirectQuery. We have made all primary fact tables, size between 10- 100 million records, using Columnstore index to achieve good performance. We have the whole solution thoroughly tested in non-production, and it worked fine.

However, when promoted to the production environment, where the azure database price tier is even more powerful from S3 to Gen 5, 4 cores, the report performance is sometimes unacceptable from the end-users point of view.

Identify performance problem

Use Performance Analyzer, along with DAX studio, and we have identified the direct query in production takes a much longer time than non-production.  There is no problem with visuals and data model, as the query generated by power BI is very efficient.  When running it in both production and non-production, the query performance is not consistent, sometimes quicker and sometimes slow.  Based on performance insight from Azure, we finally pinned down; the query is slow because there are other heavy-duty queries are running at the same time when reports are running.  We have simulated the workload in non-production and get the same problem. 

Analyze performance problem

This problem is not easy to resolve as we can't avoid other applications running against the database.  The reports are customers faced and very sensitive, as they all get used to the traditional light speed with Power BI. Any queries were running over 1 second and considered as slow. Two solutions could resolve this issue, but practically not tangible.

Change Report layout to query the database by applying, which will reduce queries to the database. However, from the end user's point of view, it is not acceptable.

Bring all reports-related tables to a separate database so that all queries from other applications won't impact report performance. This solution will make report performance consistent. However, this solution will request a significant change of architecture, also has a high cost. 

Besides, the application must be available seven days a week, 24 hours a day. We can't guarantee performance when we need to run ETL. 

Resolve performance problem

The ultimate solution is to use import mode to resolve both performance problems and availability requirements.  but how, we have applied following actions:

1. Reduce overall data amount

From the requirement point of view, we only need four years of data. We can reduce the data to 25% on our test, which is a surprising number. Suddenly, all data size is under 100 GB

2. Make a pure star scheme by removing all snowflakes

This solution will make Power BI vertical engine compress data much more efficiently, use much less memory. We have removed all big dimensions that are not directly used by reports.

3. Remove all bridge tables (many-to-many) if not needed

For the best design, sometimes, we need a bridge table to create relationships to filter data. However, if these tables are huge, then the performance could become an issue; we can directly use fact to handle it instead.

4. Remove all columns with enormous distinct value if not needed

From the Power BI VertiPaq engine point of view, these distinct values take a lot of space, and these columns can't be compressed. Even worse, their relationship takes a lot of space as well. For example, all unique ID in the fact tables and all effective ID with combined more columns should be removed, as the report will never use them. If needed, we use direct query to get detailed data.

5. Disable auto date-time in Power BI

Believe or not, there are many date columns, Power BI by default, will create all tables behind for each date column. Removing such auto tables could have big impact on refresh time and memory size. We can use DAX studio to make all these auto dates appear.

Result

After all these actions, we have made the Power BI pbx size about 0.5 GB. Given that we need double-sized pbx files, we can use any power BI capacity, even with A1 or 3 GB memory. The solution resolves the performance issue and the availability issue, as Power BI import mode resolves these two issues by nature.







No comments:

Post a Comment