Existing Problems
The original implementation had several critical issues:
Report Stability Issues
When many users accessed the report simultaneously, the system would crash or time out, leading to significant frustration.
Poor Performance
Reports took between 1.5 to 3 minutes to load, regardless of the data volume. To address this, users had to rely on "Buttons -> ‘Apply all slicers’," which added complexity and did not address the root cause.
Overly Complex Architecture
The architecture utilized mixed modes, including DirectQuery with complex parameter passing, dataflows for pre-populating aggregate-level data, and materialized views to improve SQL performance. This led to a convoluted system that was difficult to manage and optimize.
Existing Architecture
The architecture relied on three main components:
DirectQuery Mode
This mode was used extensively, with complex parameter passing from the interface, but it caused system instability and performance bottlenecks.
Dataflows
These were used to pre-populate aggregate-level data, adding another layer of complexity and maintenance.
Materialized Views
SQL materialized views were implemented to speed up query performance but introduced a significant maintenance overhead, including a one-year refresh schedule.
Simplifying the Architecture
To address these issues, I implemented a three-step plan to simplify the architecture:
Step 1: Eliminate DirectQuery
The first step was to remove DirectQuery by deeply understanding the business requirements and compressing the data as much as possible. This step involved:
Requirement Analysis:
Understanding what data and level of granularity were truly necessary.
Data Compression:
Reducing data volume through effective summarization and removal of unnecessary fields.
Optimized DAX Queries:
Refactoring DAX formulas to improve efficiency.
Outcome:
Resolved system availability issues, allowing multiple users to access reports simultaneously without crashes.
Decoupled the reports from the database, enabling data retrieval from the cloud and improving stability.
Step 2: Remove Dataflows
The next step was to eliminate dataflows and bring all necessary data directly into Power Query. This was achieved by:
SQL Performance Tuning:
Enhancing SQL query performance to fetch data efficiently without relying on intermediate dataflows.
Direct Integration:
Using Power Query to load and transform data directly, simplifying the overall data pipeline.
Outcome:
Reduced maintenance time and complexity.
Improved data loading speeds significantly.
Step 3: Replace Materialized Views
Finally, we eliminated the dependency on materialized views by querying the database directly from Power Query. This involved:
SQL Query Optimization:
Leveraging similar performance tuning techniques as in Step 2 to improve query execution times.
Real-Time Data Access:
Enabling Power Query to retrieve data as needed, removing the need for pre-populated materialized views.
Outcome:
Eliminated the one-year materialized view refresh schedule.
Simplified the system by removing an additional layer of abstraction.
Final Results
From the User Perspective:
System Availability: Reports are now available 24/7, even with multiple users accessing them simultaneously.
Improved Performance: The slowest report load time is reduced to about 10 seconds, even with maximum data volume.
From the Technical Perspective:
Eliminated Maintenance Overheads:
Removed the one-year refresh time for materialized views.
Eliminated the one-hour dataflow refresh time.
Simplified Architecture:
Reduced complexity by eliminating DirectQuery, dataflows, and materialized views.
Flexible Refreshing:
The system can now be refreshed anytime as needed without additional overhead.
Conclusion
By addressing the root causes of instability and poor performance, and by simplifying the Power BI architecture, the system became more reliable, faster, and easier to manage. This consulting experience highlights the importance of deeply understanding requirements, optimizing SQL performance, and leveraging Power Query to build a streamlined, high-performing Power BI solution.
No comments:
Post a Comment