Yin Solutions: Avoid Reusing Large Folder-Based Power Query Sources

Executive Summary

Power BI makes it easy to ingest data from folders containing many files, but when the volume reaches thousands of files and multiple gigabytes, refresh performance becomes a serious concern. A common misunderstanding is assuming that when multiple Power Query objects reference the same base folder query, Power BI automatically “reuses” that data. Unfortunately, this is not how the Power BI refresh engine works.

When downstream queries depend on a Power Query folder source—and all are enabled for load—Power BI evaluates each query independently. This means the folder is scanned multiple times during refresh, even though the queries appear to share the same base logic. The impact can be dramatic: refreshes become slow, unstable, and unnecessarily heavy on the underlying storage system.

This document explains why this happens, what the risks are, and how to redesign the dataset so the folder is loaded once, while all other logic is rebuilt efficiently inside the model using DAX calculated tables. This approach eliminates redundant IO, improves refresh performance, and clarifies a widely misunderstood aspect of Power Query behavior.

Requirements

The dataset contains three Power Query components:

Query 1
- Reads from a folder containing ~2000 files totaling ~4 GB.
- Represents the detailed transactional dataset.
Query 2
- Referenced from Query 1, applying additional transformations.
Query 3
- Also referenced from Query 1, extracting a different view of the same data.

All three queries are currently enabled for load into Power BI.

The main requirement is:

Refresh the dataset efficiently and ensure the large folder is not scanned multiple times.

Analysis

Although Queries 2 and 3 are visually created by referencing Query 1 within Power Query, this does not imply reuse or caching during refresh. Power BI treats every loaded query as an independent evaluation path.

Behind the scenes:

Each loaded query is evaluated separately
Each query’s full dependency chain is executed
Folder connectors do not share cached results

This means:

Query 1 triggers a full folder scan
Query 2 triggers another full folder scan
Query 3 triggers yet another full folder scan

Total effect:

Three full scans of ~2000 files
Three ingestion cycles of ~4 GB each
~12 GB of IO per refresh instead of 4 GB

This leads to:

Slow refresh duration
Higher chance of refresh timeout
Increased load on upstream storage
Poor scalability as logic grows

This is a common misunderstanding: referencing a query in Power Query does not guarantee shared evaluation.

Solution

To eliminate redundant folder scans, the dataset should be redesigned so that:

✔ Only one Power Query object loads the large folder

This ensures the folder is scanned once.

✔ Downstream logic is removed from Power Query

Queries derived from the folder source should not be loaded, preventing multiple evaluations.

✔ All additional “views” are rebuilt inside the model using DAX calculated tables

These tables mimic the original Power Query outputs but are computed in-memory, without touching the folder source again.

This design achieves:

A single ingestion point
Faster refresh performance
Reduced infrastructure load
A cleaner, more maintainable model

Implementation

1. Keep Query 1 as the only loaded Power Query object

Query 1 remains the sole location where the folder is read:

Query 1 → Enable Load = On

This becomes the single ingestion pipeline for the dataset.

2. Disable load for Query 2 and Query 3

These downstream queries should not trigger additional refresh paths.

In Power Query:

Right-click Query 2 → uncheck Enable Load
Right-click Query 3 → uncheck Enable Load

They remain as reference definitions, but no longer evaluate during refresh.

3. Rebuild the logic of Query 2 and Query 3 using DAX calculated tables

After Query 1 loads into the model, replicate the logic of the outdated Power Query transformations using DAX calculated tables.

This includes:

Summaries
Filters
“Latest record” logic
Grouping or aggregation
Business-specific views

These DAX tables are:

Computed once after refresh
Built from the in-memory data of Query 1
Functionally identical to the old Power Query outputs
Much faster and more reliable
Not dependent on folder scans

Once created, they replace the old Query 2 and Query 3 in the model.

4. Remap the data model and reports

With the new DAX tables in place:

Update relationships to point to the new calculated tables
Adjust measures and calculated columns if needed
Update report visuals to reference the new tables
Remove Query 2 and Query 3 from the model after migration is complete

From the user’s perspective, all visuals and logic behave exactly the same.

From the engine’s perspective, refresh is now far more efficient.

5. Validate the refresh and reporting behavior

After implementation:

Refresh the dataset
Confirm that refresh time is significantly reduced
Verify that the folder is scanned only once
Ensure all reports continue to function correctly
Finalize model cleanup

This completes the migration from a multi-scan design to a single-scan, optimized architecture.

Conclusion

A major misunderstanding in Power BI development is assuming that referencing a Power Query object automatically reuses previously loaded data. When dealing with large folder-based sources, this misunderstanding can lead to multiple unnecessary full-folder scans and severe performance degradation.

The correct architectural approach is:

Load the folder once
Disable downstream Power Query loads
Rebuild the additional views as DAX calculated tables
Remap the model and reports accordingly

This ensures a scalable, efficient, and maintainable dataset that refreshes quickly and avoids redundant IO.

Yin Solutions

Friday, December 5, 2025

Avoid Reusing Large Folder-Based Power Query Sources — Clarifying a Common Misunderstanding