Thursday, October 1, 2020

How to understand Power BI incremental refresh

  Check list of all posts

The basic concept of the incremental refresh:

1.The connection model must be import, while direct Query is not applied.

2.Data source coming from a data source must support query folding. If you use a native query, the incremental refresh is not working, as it doesn't support native queries. You can turn the native queries into the views to resolve this issue.

3.The two parameters of RangeStart and RangeEnd must be setup.

4.Incremental refresh applies for an individual table but shares with RangeStart and RangeEnd.

5.The initial value for RangeStart and RangeEnd are used in desktop only, while the values of RangeStart and RangeEnd are dynamically determined with Power bi services.

6.Users cannot download the PBIX file, if dataset is loaded with the incremental refresh.


The settings:

1.Identify Date column called OrderDate and filter data with RangeStart and RangeEnd, and then enable incremental refresh.

2.Specify stored rows with 5 years, and refresh rows with 10 days. Suppose that today Oct 1, 2020, then 

2.1. Any data before Oct 1, 2015 will be removed.

2.2 Any data change from Oct 1, 2015 to Sep 21, 2020, will NOT be refreshed, even there are any changes

2.3. Any data change from Sep 21, 2020 to Oct 1, 2020 will be refreshed.

2.4 Behind the scene, Power BI issues 10 queries to database as 

where OrderDate >='2020-09-30' and OrderDate < '2020-10-01' 

where OrderDate >='2020-09-29' and OrderDate < '2020-09-30' 

3.Enable Data Change detection: 

3.1. In traditional ETL, if you have a modified Date, then the process of incremental refresh can monitor that field, and only get rows that their date/time is after the latest date/time in that field in the previous refresh. This approach is still valid for a data warehouse as Power BI data source. 

3.2. We need to specify the modified date from the table. BTW, This modified date can be derived from another a few columns. Notice that this is different from the OrderDate.

3.3 The logic is to check each day as a block, day by day. If the maximum last modified date from source is greater than the maximum last modified date from power BI cache, then all records will be refreshed, regardless of whether the last modified date is greater than the last modified date from power BI cache or not

4.Enable Only Refresh Complete Period: it depends on the period you selected when you set up the incremental refresh, you can only refresh when the period is complete. In this case, it is by day. Date time will be ignored.

5. More notes:

5.1.When data in the past has been deleted, there could be different solutions, such as using the soft delete or simply refresh the whole dataset.

5.2.When data model has been changed, a full refresh is an easy option to proceed. However, if the data amount is huge, then we need to figure out a better way to promote the dataset.

6. Further study:

Keep The Existing Data In Your Power BI Dataset And Add New Data To It Using Incremental Refresh


No comments:

Post a Comment