Wednesday, March 25, 2026

Solving Power BI Drillthrough Limitations for Multi-Period Daily Analysis

Background

In many Power BI reports, the main page is filtered at a period level—for example, a selected month such as February 2025.

When users drill through to a detail page, the requirement often expands to:

display all daily records for the selected period, and
at the same time display all daily records for a comparison period

The comparison period can be:

previous month
next month
prior year
previous quarter
or any custom offset

The Core Challenge

The drillthrough page receives a filter from DimDate, which defines the selected period.

This creates a fundamental limitation:

visuals using DimDate only show the selected period
another period cannot be shown simultaneously using the same relationship

👉 A single connected date dimension can only propagate one active filter context.

Initial Approach: Connected `DimDate (RolePlay)`

A common first attempt is:

create a second date table: DimDate (RolePlay)
connect it to the fact table
use it for the comparison period

Design

DimDate → drives drillthrough (selected period)
DimDate (RolePlay) → drives comparison period

Why This Approach Fails (Performance)

Connecting DimDate (RolePlay) directly to the fact table introduces significant performance overhead.

More complex filter propagation
Multiple date relationships force the engine to evaluate more complex filter paths, reducing optimization efficiency.
Increased formula engine workload
The engine spends more time resolving competing date contexts instead of pushing simple queries to the storage engine.
Reduced cache reuse
Additional date paths create more unique filter combinations, which limits cache effectiveness.
Slower queries at scale
For large models with daily granularity and complex measures like [MeasureX], this results in noticeably slower report performance.

👉 In short, multiple connected date tables increase computation cost and reduce performance efficiency.

Final Solution: Disconnected `DimDate (RolePlay)` + `TREATAS`

A better design is:

keep only one connected DimDate
convert DimDate (RolePlay) into a disconnected helper table
use TREATAS inside measures to control the date context

Key Idea

👉 Do not rely on relationships — control date context explicitly in the measure.

Implementation Pattern

1. Connected date table

DimDate remains the only table connected to the fact table
handles drillthrough and standard filtering

2. Disconnected helper table

DimDate (RolePlay) has no relationship
used only to generate comparison-period rows

3. Measure logic


Comparison Period MeasureX =
VAR _RolePlayDate =
    SELECTEDVALUE('DimDate (RolePlay)'[Date])
RETURN
CALCULATE(
    [MeasureX],
    REMOVEFILTERS('DimDate'),   -- remove drillthrough filter
    TREATAS({_RolePlayDate}, 'DimDate'[Date])   -- apply desired date
)

How It Works

Step 1 — Remove original filter

clears the drillthrough period (e.g., February)

Step 2 — Apply new date

injects comparison period date (e.g., January)

Result

no conflicting filters
precise control of date context
correct results for both periods

Benefits

Better performance → simpler filter paths
Cleaner model → single source of truth
More flexible → supports any time offset
Easier to maintain → explicit logic

Recommended Pattern

✅ One connected DimDate
✅ One disconnected DimDate (RolePlay)
✅ Use TREATAS in [MeasureX]
❌ Avoid multiple connected date tables

Conclusion

The limitation comes from the fact that a connected date dimension can only support one active period at a time.

Instead of introducing additional connected date tables, the optimal solution is:

use a disconnected helper table
explicitly override date context using TREATAS

Executive Summary

Use a single connected DimDate, a disconnected helper date table, and override date context in [MeasureX] using REMOVEFILTERS and TREATAS. This avoids performance issues and enables flexible multi-period analysis in drillthrough pages.

Tuesday, February 3, 2026

Power BI File-Based Incremental Refresh Failure at Month Change — Root Cause & A Robust Fix

How to Prevent Power BI File-Based Incremental Refresh from Failing When a Partition Has No Files (Month Change Issue)

1. Background

Incremental refresh in Power BI is one of the most important performance and scalability features when dealing with large datasets. Instead of reloading the entire dataset every time, incremental refresh:

splits data into partitions (typically monthly or daily)
refreshes only the newest partitions
archives older partitions

This works extremely well for database sources, but teams often run into unexpected failures when applying incremental refresh to file-based sources (Folder / SharePoint / SFTP folders) such as:

daily CSV transaction extracts
Excel files exported by upstream processes
flat-file snapshots of operational systems
vendor feeds delivered into network folders

2. Problem Statement

Symptom

Power BI dataset refresh fails in the Power BI Service / Gateway with an error similar to:

There weren't enough elements in the enumeration to complete the operation.
(MashupDataAccessError / Microsoft.Data.Mashup.ValueError / Expression.Error)

In practice, this frequently occurs at month change, for example:

Today is Feb 1
incremental policy refreshes Feb partition
Power BI applies partition filter 2026-02-01 to 2026-02-28
but the folder still contains files only up to Jan 31
result: folder query returns 0 files
Power Query tries to combine / expand data anyway and crashes

3. Why This Happens (Root Cause Analysis)

3.1 Incremental refresh with files behaves differently

When incremental refresh is enabled for a table, Power BI automatically generates partitions using the RangeStart and RangeEnd parameters.

In a database incremental refresh scenario, this is easy:


WHERE TxnDate >= RangeStart AND TxnDate < RangeEnd

If the database returns 0 rows, that’s valid:
✅ refresh succeeds, no data loaded.

But for file-based sources, the refresh flow often looks like:

List files (Folder.Files)
Filter only needed files (extension, naming convention)
Derive file date (from name or metadata)
Apply incremental filter to files
Combine files (invoke custom function / expand table columns)
Type conversion / rename / select columns

The key issue:

Folder-based queries frequently assume at least one file exists.

3.2 The hidden “{0} first row / sample file” dependency

Power BI’s “Combine Files” pattern relies on:

#"Sample File ...", and
Table.ColumnNames(#"Transform File..."(#"Sample File ..."))

When there are zero files, Power Query cannot reference:

first file
sample file
transform function output schema

This produces the gateway error:

not enough elements in the enumeration

4. Why the Problem Does NOT Occur for Database-Based Incremental Refresh

Database sources naturally support:

empty result sets
schema stability
query folding into SQL
predictable evaluation paths

File sources often do not:

schema is inferred from sample file
file list may become empty
Power Query expand steps require at least one row/file

Therefore:
✅ Database incremental refresh → stable
⚠️ File incremental refresh → fragile during “no file” partitions

5. The Correct Expected Behavior

From a business and refresh perspective, when a partition has no file:

✅ Refresh should succeed
✅ Dataset should remain valid
✅ Incremental pipeline should continue
✅ New partitions should be ready to accept future files

Power BI refresh should treat “no files” similar to “no rows”.

6. The Solution: Handle “No Files” Case Explicitly

High-level design

We introduce a clean pattern:

Define an empty output table with correct schema
Wrap the “normal processing” section into a variable RealDataOutput = let ... in ...
Use a final switch:


Output = if Table.IsEmpty(#"Incremental loading") then EmptyDataOutput else RealDataOutput

This is a safe, minimal-change approach that allows developers to:

✅ build normal query as usual
✅ add only 3 reusable sections
✅ avoid breaking incremental refresh
✅ stop the gateway crash

7. Sample Implementation Template (Recommended Pattern)

Below is a reusable template.
Developers can keep their normal ETL steps unchanged.

✅ Power Query Script Template: “3 Sections No-Files Handling”


let

//====================================================
// (1) Handle no files: EMPTY OUTPUT (schema only)
//====================================================
    EmptyDataOutput =
        #table(
            type table [
                Column1 = text,
                #"Column 2" = number,
                #"Column 3" = text,
                FileDateTime = datetime
            ],
            {}
        ),

//====================================================
// Normal steps (develop as usual)
//====================================================
    Source = Folder.Files("\\...\Source Data"),

    #"Filtered Rows - Cleanup" =
        Table.SelectRows(Source, each Text.StartsWith([Name], "xyz_") and [Extension] = ".csv"),

    #"Added FileDateTime" =
        Table.AddColumn(
            #"Filtered Rows - Cleanup",
            "FileDateTime",
            each DateTime.From(Date.FromText(Text.Select([Name], {"0".."9"}))),
            type datetime
        ),

    #"Incremental loading" =
        Table.SelectRows(
            #"Added FileDateTime",
            each [FileDateTime] >= #"RangeStart" and [FileDateTime] < #"RangeEnd"
        ),

//====================================================
// (2) Handle no files: REAL OUTPUT (normal logic)
//====================================================
    RealDataOutput =
        let
            #"Invoke Custom Function" =
                Table.AddColumn(
                    #"Incremental loading",
                    "Transform File",
                    each #"Transform File xyz"([Content])
                ),

            #"Removed Other Columns" =
                Table.SelectColumns(#"Invoke Custom Function", {"FileDateTime", "Transform File"}),

            #"Expanded Table Column" =
                Table.ExpandTableColumn(
                    #"Removed Other Columns",
                    "Transform File",
                    Table.ColumnNames(#"Transform File xyz"(#"Sample File xyz"))
                ),

            #"Changed Type" =
                Table.TransformColumnTypes(
                    #"Previous step",
                    {
                        {"Column1", type text},
                        {"Column 2", type number},
                        {"Column 3", type text},
                        {"FileDateTime", type datetime}
                    }
                ),

            QueryOutput =
                Table.SelectColumns(
                    #"Changed Type",
                    {"Column1", "Column 2", "Column 3", "FileDateTime"}
                )
        in
            QueryOutput,

//====================================================
// (3) Handle no files: FINAL SWITCH
//====================================================
    Output =
        if Table.IsEmpty(#"Incremental loading") then EmptyDataOutput else RealDataOutput

in
    Output

8. Why This Works (Technical Explanation)

When there are no files

#"Incremental loading" is empty
Table.IsEmpty(#"Incremental loading") returns true
query immediately returns EmptyDataOutput
“combine files” logic never executes

When there are files

normal block runs as expected
output schema matches EmptyDataOutput
incremental refresh can generate partitions without failure

9. Best Practices (Strongly Recommended)

✅ 9.1 Make empty schema exactly match final output

Your EmptyDataOutput must match:

column names
data types
column count/order (preferably)

If not, Power BI can throw “schema mismatch” errors.

✅ 9.2 Derive FileDateTime from file name carefully

This pattern is common but risky:


Text.Select([Name], {"0".."9"})

Because it might collect digits from:

file version number
sequence number
other metadata

Better:

enforce naming convention
parse exact substring
validate with try ... otherwise null

✅ 9.3 Keep the incremental filter step simple

Ensure the partition filter is applied only on the file-date column:


each [FileDateTime] >= RangeStart and [FileDateTime] < RangeEnd

✅ 9.4 Don’t use `{0}` / first file logic in incremental refresh

Avoid:


FilteredFiles{0}[Content]

Because it will always crash on empty partitions.

10. Final Summary

File-based incremental refresh in Power BI can fail during month transitions because Power Query’s “Combine Files” logic assumes the folder contains at least one file.

✅ The fix is simple and scalable:

Define an empty output table
wrap normal processing in a RealDataOutput let block
return empty table when file list is empty

This approach:
✅ prevents refresh failure
✅ preserves incremental refresh behavior
✅ keeps query design clean
✅ requires minimal change to existing scripts

Saturday, December 13, 2025

Standardizing Entity-Based Row-Level Security in Power BI

From Architecture Vision to a Production-Ready Implementation Template

Introduction

As Power BI adoption grows across business domains, security quickly becomes one of the hardest problems to scale. Many organizations start with good intentions but end up with duplicated datasets, inconsistent access rules, manual fixes, and performance bottlenecks—especially when hierarchical access is involved.

This blog presents a complete, ready-to-publish reference architecture and implementation template for Entity-based Row-Level Security (RLS) in Power BI. It combines conceptual design, governance principles, and a hands-on implementation blueprint that teams can reuse across domains such as revenue, cost, risk, and beyond.

The goal is simple:
👉 One conformed security dimension, one semantic model, many audiences—securely and at scale.

Executive Summary

The proposed solution standardizes on Entity as the single, conformed security dimension (SCD Type 1) driving dynamic RLS across Power BI datasets.

User access is centrally managed in the Business entity interface application, synchronized through ETL, and enforced in Power BI using native RLS. Where higher-level access is required, Department-based static roles complement the dynamic model. A FullData role supports approved no-security use cases—without duplicating datasets.

The result is a secure, scalable, auditable, and performance-friendly Power BI security framework that is production-ready.

Why Entity-Based Security?

Traditional Power BI security implementations often suffer from:

Dataset duplication for different audiences
Hard-coded filters and brittle DAX
Poor performance with deep hierarchies
Limited auditability and governance

An Entity-based security model solves these problems by:

Introducing a single conformed dimension reused across all facts
Separating entitlements from data modeling
Supporting both granular and coarse-grained access
Scaling naturally as new domains and fact tables are added

Entity becomes the language of access control across analytics.

End-to-End Security Flow

At a high level, the solution works as follows:

Business entity interface application maintains user ↔ Entity entitlements
ETL processes refresh:
- Dim_Entity (SCD1)
- Entity_Hier (hierarchy bridge)
- User_Permission (effective access)
Power BI binds the signed-in user via USERPRINCIPALNAME()
Dynamic RLS filters data by Entity and all descendants
Department-based static roles provide coarse-grained access
RLS_FullData supports approved no-security audiences

All of this is achieved within a single semantic model.

Scope and Objectives

This framework applies to:

Power BI datasets and shared semantic models
Fact tables such as:
- Fact_Revenue
- Fact_Cost
- Fact_Risk
- Fact_[Domain] (extensible)
Entity-based and Department-based security
Centralized entitlement management and governance

Out of scope:

Database-level RLS
Report-level or app-level access configuration

The focus is dataset-level security governance.

Canonical Data Model

Core Tables

Dimensions & Security

Dim_Entity – conformed Entity dimension (SCD1)
Entity_Hier – parent-child hierarchy bridge
User_Permission – user ↔ Entity entitlement mapping
Dim_User (optional) – identity normalization

Facts

Fact_Revenue
Fact_Cost
Fact_Risk
Fact_[Domain]

Dim_Entity (SCD Type 1)


Entity_Key (PK)
Entity_Code
Entity_Name
Parent_Entity_Code
Entity_Level
Is_Active

Current-state only
One row per Entity
No historical tracking

Entity_Hier (Bridge)


Parent_Entity_Key
Child_Entity_Key
LevelFromTop

Pre-computed in ETL
Includes self-to-self rows
Optimized for hierarchical security expansion

User_Permission


User_Email
Entity_Key
Granted_By
Granted_Date

Source of truth: Business entity interface application
No calculated columns
Fully auditable

Fact Tables (Standard Pattern)

Example: Fact_Revenue


Entity_Key
Date_Key
Amount
Currency
Other domain attributes

Rule:
Every fact table must carry a resolvable Entity_Key.

Relationship Design (Strict by Design)


User_Permission → Entity_Hier → Dim_Entity → Facts

Single-direction relationships only
No bi-directional filters by default
Security propagation handled exclusively by RLS

This avoids accidental over-filtering and performance regressions.

Row-Level Security Design

1. Dynamic Entity-Based Role

Role name: RLS_Entity_Dynamic


User_Permission[User_Email] = USERPRINCIPALNAME()


Entity_Hier[Parent_Entity_Key]
    IN VALUES ( User_Permission[Entity_Key] )

Grants

Explicit Entity access
All descendant Entities automatically

2. Department-Based Static Role

Role name: RLS_Department_Static


Dim_Department[Department_Code] IN { "FIN", "OPS", "ENG" }

Used for:

Executive access
Oversight and aggregated reporting

3. No-Security Role

Role name: RLS_FullData


TRUE()

Applied only to approved security groups
Uses disconnected security dimensions
No dataset duplication

ETL Responsibilities and Governance

ETL is responsible for:

Maintaining Dim_Entity as SCD1
Regenerating Entity_Hier on Entity changes
Synchronizing User_Permission entitlements
Capturing freshness timestamps

Mandatory data quality checks

Orphan Entity keys
Missing Entity mapping in facts
Stale entitlement data

Governance dashboards in Power BI surface:

Users without access
Orphaned permissions
Entity coverage by fact table

Handling Non-Conforming Data

Not all datasets are perfect. This framework addresses reality by:

Cataloguing fact tables lacking Entity keys
Introducing mapping/bridge tables via ETL
Excluding unresolved rows from secured views
Enforcing coverage targets (e.g., ≥ 99.5%)

Security integrity is preserved without blocking delivery.

Deployment & Rollout Checklist

Dataset

RLS roles created and tested
Relationships validated
No calculated security tables

Security

All Viewer/App users assigned to a role
Dataset owners restricted
FullData role explicitly approved

Testing

Parent Entity sees children
Multiple Entity grants = union
No entitlement = no data
Deep hierarchy performance validated

Benefits Realized

This combined architecture and implementation delivers:

One conformed security dimension
One semantic model for all audiences
Strong auditability and governance
Predictable performance at scale
A future-proof template for new domains

Security moves from a tactical concern to a strategic platform capability.

Closing Thoughts

Entity-based Row-Level Security is not just a Power BI technique—it is a modeling discipline. By separating entitlements from facts, pre-computing hierarchies, and enforcing consistent patterns, organizations can scale analytics securely without sacrificing agility or performance.

This reference architecture and implementation template is ready for rollout, ready for reuse, and ready for governance.

Monday, December 8, 2025

Avoiding the Performance Trap: Files-Based Incremental Loading for Large Folder Power Query Sources

Executive Summary

Organizations increasingly rely on Power BI to analyze data stored in shared folders containing hundreds or even thousands of flat files produced by modern Big Data platforms. While this file-based architecture supports strong data management practices—such as lineage, auditability, and replayability—it creates a major refresh performance challenge: Power BI tries to read every file during every refresh unless incremental refresh is implemented correctly.

This blog provides:

A complete, practical blueprint for files-based incremental loading
Clear explanations of the three types of refresh behaviors
Deep insight into why performance issues occur
A comprehensive list of common pitfalls, with reasons and consequences

When implemented properly, incremental refresh transforms a slow, hour-long refresh into a fast, stable, 5–10 minute refresh—even with years of historical files.

1. Why Do Big Data Platforms Produce So Many Files?

Modern data platforms are designed around append-only ingestion. Instead of updating existing data, they continuously write small, time-stamped files—daily, hourly, or by batch. This pattern:

Ensures auditability
Preserves full historical lineage
Allows efficient reprocessing and rollback
Aligns with cloud data lake best practices

The result is inevitable: massive file counts in shared locations accessed by Power BI.

To make Power BI perform well against such sources, incremental loading is essential.

2. What Is Files-Based Incremental Loading?

Databases support incremental refresh naturally because they include date columns and optimized partition indexes. Flat files do not.
To mimic database-like behavior for folder sources, Power BI must:

Extract a date value from each file name (FileDateTime).
Use RangeStart and RangeEnd parameters to select only the relevant files.
Apply incremental refresh policies that freeze historical data and refresh only recent partitions.

This reduces unnecessary work and provides consistent, efficient refresh cycles.

3. High-Level Architecture

The architecture follows a clean, layered design:

Folder Source: Power BI connects to a shared folder containing many files.
Staging Layer: Extracts a date value from each file name, then filters files using RangeStart and RangeEnd—before expanding file contents.
Fact Layer: Expands and transforms only the filtered files.
Incremental Refresh Engine: Power BI Service partitions and manages historical vs. incremental data.

This separation ensures Power BI reads only what is needed.

4. Core Implementation Pattern

4.1 Create RangeStart and RangeEnd parameters

These DateTime parameters define the boundaries of the incremental refresh window. Desktop uses them literally; the Service replaces them dynamically.

4.2 Derive FileDateTime from the file name

For example, if a file name begins with a date such as “20230102,” interpret it as January 2, 2023. This becomes the date used for filtering and partitioning.

4.3 Filter early, before expanding file contents

Filter the list of files so only those whose FileDateTime is within the desired window are expanded and processed.
The correct pattern is:

FileDateTime greater than or equal to RangeStart
FileDateTime strictly less than RangeEnd

4.4 Apply incremental refresh

Define how many years of history to keep and how many days/weeks/months to refresh.
Power BI then creates partitions and ensures only recent partitions are refreshed going forward.

5. Understanding the Three Refresh Scenarios

Power BI handles refreshes differently in Desktop and in the Service. Misunderstanding these differences often leads to wrong expectations or incorrect troubleshooting.

Scenario 1 — Power BI Desktop: Development Mode

What actually happens

Desktop loads exactly the files that fall between the RangeStart and RangeEnd parameters you define.
It does not create partitions.
It does not perform dynamic date substitution.
It does not optimize file scanning.

Reason

Desktop is built for development and debugging—not for processing large historical datasets.

What to expect

Small date window → fast and smooth
Large date window → very slow, memory-heavy, or crashes

Result if misused

Developers mistakenly think incremental refresh “doesn’t work” because Desktop behaves like a full refresh when too many files are included.

Best practice

Use only a small date testing window (1–3 days).

Scenario 2 — Power BI Service: Initial Full Load (First Refresh After Publishing)

What actually happens

Power BI Service completely ignores the Desktop parameter values and instead loads:

The entire historical dataset required by your incremental refresh policy
All partitions needed for historical storage

This is the only time Power BI performs a complete historical load.

Reason

Power BI must build the entire partition structure for incremental refresh to work.

What to expect

This refresh may take a long time, depending on data volume.
It prepares the model for efficient future incremental refresh cycles.

Result if misunderstood

Teams think incremental refresh is “slow,” not realizing this is a required setup step.
Multiple republishing resets the partition structure and forces another full load.

Best practice

Allow the first refresh to complete and avoid unnecessary republishing.

Scenario 3 — Power BI Service: Ongoing Incremental Refresh (Normal Operation)

What actually happens

After partitions are built:

Only recent partitions (for example, the last 7 days) are refreshed
Older partitions remain frozen
Only new files are processed

Reason

Incremental refresh is designed to process only new data (“delta” refresh).

What to expect

Large performance improvement
Stable refresh durations
Minimal system load

Result if implemented incorrectly

If filtering is incorrect, file structures are bad, or partitions are too granular, Power BI may still scan all files or reprocess excessive partitions.

Best practice

Optimize filtering, folder structure, and partition granularity to maximize efficiency.

6. Benefits of Files-Based Incremental Loading

Massively reduced refresh time
Stable, predictable refresh performance
Historical data becomes immutable and reliable
Reduced workload on gateways and file systems
More efficient troubleshooting and data governance

7. Common Pitfalls & How to Avoid Them

Pitfall 1 — Incorrect Filter Pattern (Not Left-Inclusive, Right-Exclusive)

Reason

If the filter allows FileDateTime values on both boundaries, partitions overlap.

Result

Duplicate rows
Incorrect totals
Reprocessed files
Wrong business results

Fix

Always include the lower boundary and exclude the upper boundary.

Pitfall 2 — Too Many Partitions (Daily/Weekly/Monthly)

Reason

Power BI creates one partition per time unit.
If you define sixty daily partitions, Power BI scans the entire folder sixty times.

Result

Extremely long refresh
Heavy load on gateway and file server
Incremental refresh becomes slower than full refresh

Fix

Use weekly or monthly partitions for large file volumes, and keep the refresh window short.

Pitfall 3 — Keeping All Files in One Large Flat Folder

Reason

File systems do not support partition-aware scanning.
Power BI must enumerate every file repeatedly across partitions.

Result

Very slow metadata scanning
Refresh performance degrades as the folder grows
Service load increases exponentially
Incremental refresh loses effectiveness

Fix

Organize files into year/month subfolders so Power BI can skip entire folders.

Pitfall 4 — Filtering Files Too Late in the Query

Reason

If filtering is applied after expanding file contents, Power BI must load every file before deciding which ones to discard.

Result

Essentially becomes a full refresh
Very slow processing
High memory usage

Fix

Apply FileDateTime filtering immediately after reading folder metadata.

Pitfall 5 — Large Parameter Window in Desktop

Reason

Desktop uses the parameter values literally and loads all files in the range.

Result

Desktop freezes or crashes
Development slows dramatically

Fix

Keep Desktop ranges small (1–3 days).

8. Summary

Folder-based incremental refresh is powerful—but delicate.
When implemented correctly, it resolves one of the biggest performance barriers in Power BI: efficiently handling very large collections of flat files.

Success depends on:

Using the correct filtering logic
Managing partition granularity
Designing a smart folder structure
Filtering early
Using small development windows
Understanding Desktop vs. Service refresh behavior

With these principles, even large multi-year file archives refresh efficiently, reliably, and predictably.

----------------------------------------------------
Journal

March 1: dataset refresh not failed, and data from Feb 27 not ready.

Refresh window = March 2026; Drop partition 2026-03; Re-query source for: WHERE Date >= 2026-03-01 AND Date < 2026-04-01;

Step C — Keep Historical Partitions Frozen

These are NOT touched:


2025-01 → 2026-02

Friday, December 5, 2025

Avoid Reusing Large Folder-Based Power Query Sources — Clarifying a Common Misunderstanding

Executive Summary

Power BI makes it easy to ingest data from folders containing many files, but when the volume reaches thousands of files and multiple gigabytes, refresh performance becomes a serious concern. A common misunderstanding is assuming that when multiple Power Query objects reference the same base folder query, Power BI automatically “reuses” that data. Unfortunately, this is not how the Power BI refresh engine works.

When downstream queries depend on a Power Query folder source—and all are enabled for load—Power BI evaluates each query independently. This means the folder is scanned multiple times during refresh, even though the queries appear to share the same base logic. The impact can be dramatic: refreshes become slow, unstable, and unnecessarily heavy on the underlying storage system.

This document explains why this happens, what the risks are, and how to redesign the dataset so the folder is loaded once, while all other logic is rebuilt efficiently inside the model using DAX calculated tables. This approach eliminates redundant IO, improves refresh performance, and clarifies a widely misunderstood aspect of Power Query behavior.

Requirements

The dataset contains three Power Query components:

Query 1
- Reads from a folder containing ~2000 files totaling ~4 GB.
- Represents the detailed transactional dataset.
Query 2
- Referenced from Query 1, applying additional transformations.
Query 3
- Also referenced from Query 1, extracting a different view of the same data.

All three queries are currently enabled for load into Power BI.

The main requirement is:

Refresh the dataset efficiently and ensure the large folder is not scanned multiple times.

Analysis

Although Queries 2 and 3 are visually created by referencing Query 1 within Power Query, this does not imply reuse or caching during refresh. Power BI treats every loaded query as an independent evaluation path.

Behind the scenes:

Each loaded query is evaluated separately
Each query’s full dependency chain is executed
Folder connectors do not share cached results

This means:

Query 1 triggers a full folder scan
Query 2 triggers another full folder scan
Query 3 triggers yet another full folder scan

Total effect:

Three full scans of ~2000 files
Three ingestion cycles of ~4 GB each
~12 GB of IO per refresh instead of 4 GB

This leads to:

Slow refresh duration
Higher chance of refresh timeout
Increased load on upstream storage
Poor scalability as logic grows

This is a common misunderstanding: referencing a query in Power Query does not guarantee shared evaluation.

Solution

To eliminate redundant folder scans, the dataset should be redesigned so that:

✔ Only one Power Query object loads the large folder

This ensures the folder is scanned once.

✔ Downstream logic is removed from Power Query

Queries derived from the folder source should not be loaded, preventing multiple evaluations.

✔ All additional “views” are rebuilt inside the model using DAX calculated tables

These tables mimic the original Power Query outputs but are computed in-memory, without touching the folder source again.

This design achieves:

A single ingestion point
Faster refresh performance
Reduced infrastructure load
A cleaner, more maintainable model

Implementation

1. Keep Query 1 as the only loaded Power Query object

Query 1 remains the sole location where the folder is read:

Query 1 → Enable Load = On

This becomes the single ingestion pipeline for the dataset.

2. Disable load for Query 2 and Query 3

These downstream queries should not trigger additional refresh paths.

In Power Query:

Right-click Query 2 → uncheck Enable Load
Right-click Query 3 → uncheck Enable Load

They remain as reference definitions, but no longer evaluate during refresh.

3. Rebuild the logic of Query 2 and Query 3 using DAX calculated tables

After Query 1 loads into the model, replicate the logic of the outdated Power Query transformations using DAX calculated tables.

This includes:

Summaries
Filters
“Latest record” logic
Grouping or aggregation
Business-specific views

These DAX tables are:

Computed once after refresh
Built from the in-memory data of Query 1
Functionally identical to the old Power Query outputs
Much faster and more reliable
Not dependent on folder scans

Once created, they replace the old Query 2 and Query 3 in the model.

4. Remap the data model and reports

With the new DAX tables in place:

Update relationships to point to the new calculated tables
Adjust measures and calculated columns if needed
Update report visuals to reference the new tables
Remove Query 2 and Query 3 from the model after migration is complete

From the user’s perspective, all visuals and logic behave exactly the same.

From the engine’s perspective, refresh is now far more efficient.

5. Validate the refresh and reporting behavior

After implementation:

Refresh the dataset
Confirm that refresh time is significantly reduced
Verify that the folder is scanned only once
Ensure all reports continue to function correctly
Finalize model cleanup

This completes the migration from a multi-scan design to a single-scan, optimized architecture.

Conclusion

A major misunderstanding in Power BI development is assuming that referencing a Power Query object automatically reuses previously loaded data. When dealing with large folder-based sources, this misunderstanding can lead to multiple unnecessary full-folder scans and severe performance degradation.

The correct architectural approach is:

Load the folder once
Disable downstream Power Query loads
Rebuild the additional views as DAX calculated tables
Remap the model and reports accordingly

This ensures a scalable, efficient, and maintainable dataset that refreshes quickly and avoids redundant IO.

Wednesday, March 25, 2026

Solving Power BI Drillthrough Limitations for Multi-Period Daily Analysis

Background

The Core Challenge

Initial Approach: Connected DimDate (RolePlay)

Design

Why This Approach Fails (Performance)

Final Solution: Disconnected DimDate (RolePlay) + TREATAS

Key Idea

Implementation Pattern

1. Connected date table

2. Disconnected helper table

3. Measure logic

How It Works

Step 1 — Remove original filter

Step 2 — Apply new date

Result

Benefits

Recommended Pattern

Conclusion

Executive Summary

Tuesday, February 3, 2026

Power BI File-Based Incremental Refresh Failure at Month Change — Root Cause & A Robust Fix

1. Background

2. Problem Statement

Symptom

3. Why This Happens (Root Cause Analysis)

3.1 Incremental refresh with files behaves differently

3.2 The hidden “{0} first row / sample file” dependency

4. Why the Problem Does NOT Occur for Database-Based Incremental Refresh

5. The Correct Expected Behavior

6. The Solution: Handle “No Files” Case Explicitly

High-level design

7. Sample Implementation Template (Recommended Pattern)

✅ Power Query Script Template: “3 Sections No-Files Handling”

8. Why This Works (Technical Explanation)

When there are no files

When there are files

9. Best Practices (Strongly Recommended)

✅ 9.1 Make empty schema exactly match final output

✅ 9.2 Derive FileDateTime from file name carefully

✅ 9.3 Keep the incremental filter step simple

✅ 9.4 Don’t use {0} / first file logic in incremental refresh

10. Final Summary

✅ The fix is simple and scalable:

Saturday, December 13, 2025

Standardizing Entity-Based Row-Level Security in Power BI

Introduction

Executive Summary

Why Entity-Based Security?

End-to-End Security Flow

Scope and Objectives

Canonical Data Model

Core Tables

Dim_Entity (SCD Type 1)

Entity_Hier (Bridge)

User_Permission

Fact Tables (Standard Pattern)

Relationship Design (Strict by Design)

Row-Level Security Design

1. Dynamic Entity-Based Role

2. Department-Based Static Role

3. No-Security Role

ETL Responsibilities and Governance

Handling Non-Conforming Data

Deployment & Rollout Checklist

Benefits Realized

Closing Thoughts

Monday, December 8, 2025

Avoiding the Performance Trap: Files-Based Incremental Loading for Large Folder Power Query Sources

Executive Summary

1. Why Do Big Data Platforms Produce So Many Files?

2. What Is Files-Based Incremental Loading?

3. High-Level Architecture

4. Core Implementation Pattern

4.1 Create RangeStart and RangeEnd parameters

4.2 Derive FileDateTime from the file name

4.3 Filter early, before expanding file contents

4.4 Apply incremental refresh

5. Understanding the Three Refresh Scenarios

Initial Approach: Connected `DimDate (RolePlay)`

Final Solution: Disconnected `DimDate (RolePlay)` + `TREATAS`

✅ 9.4 Don’t use `{0}` / first file logic in incremental refresh