Showing posts with label 3. Dimensional Modeling. Show all posts
Showing posts with label 3. Dimensional Modeling. Show all posts

Monday, March 1, 2021

How to extract Power BI model to data warehouse bus matrix

 Check list of all posts

To quickly overview your power bi data model, we can turn the model into a data ware warehouse matrix. This document is intended to provide an easy way with the steps listed below.

Step 1: Open DaxStudio and run DMV statements:

select FromTableID, FromColumnID, ToTableID, ToColumnID from $System.TMSCHEMA_RELATIONSHIPS;
select [ID],[Name] from $System.TMSCHEMA_TABLES;
select [ID],TableID,ExplicitName from $System.TMSCHEMA_COLUMNS where LEN(ExplicitName) < 40;

Step 2: Put each query results into excel spreadsheet to generate SQL queries:

drop table if exists #RELATIONSHIPS;

create table #RELATIONSHIPS (FromTableID int, FromColumnID int, ToTableID int, ToColumnID int);

insert into #RELATIONSHIPS values 

 (10468,10480,1046,1049),

 (10468,10476,1119,1122),

.......

 (17766,17773,7789,7797);



 drop table if exists #TABLES;

create table #TABLES (ID int, Name varchar(50));

insert into #TABLES values 

 (1046,'Tabname1'),

 (1119,'Tabname2'),

......

 (17766,'Fact5');


drop table if exists #COLUMNS;

create table #COLUMNS (ID int, TableID int,ExplicitName varchar(50));

insert into #COLUMNS values 

 (1049,1046,'Column1'),

 (1050,1046,'Column2'),

......

 (17867,17766,'ColumnN');

Step 3: Run queries below, you will get matrix

drop table if exists #RELATIONSHIPSEX;
 select 
  R.FromTableID
 ,TF.Name as FromTable
 ,R.FromColumnID 
 ,CF.ExplicitName as FromColumn
 ,R.ToTableID
 ,TT.Name as ToTable
 ,R.ToColumnID 
 ,CT.ExplicitName as ToColumn
 ,1 R 
 into #RELATIONSHIPSEX
 from #RELATIONSHIPS R
 left join #TABLES TF on (R.FromTableID=TF.ID)
 left join #TABLES TT on (R.ToTableID=TT.ID)
 left join #COLUMNS CF on (R.FromColumnID=CF.ID)
 left join #COLUMNS CT on (R.ToColumnID=CT.ID);

drop table if exists #MatrixHighlevel;
drop table if exists #MatrixDetail;

SELECT ToTable as Dimension,[Fact1], [Fact2], [Fact3] , [Fact4] , [Fact5]  
into #MatrixHighlevel
FROM   
(SELECT R,FromTable,ToTable
FROM #RELATIONSHIPSEX) p  
PIVOT  
(  
sum(R) 
FOR FromTable IN  
( Fact1, Fact2, Fact3, Fact4, Fact5 )  
) AS pvt  
ORDER BY ToTable;  


SELECT ToTable as Dimension,[Fact1], [Fact2], [Fact3] , [Fact4] , [Fact5]
into #MatrixDetail
FROM   
(SELECT R,FromTable,ToTable,FromColumn,ToColumn,ToTable+'.'+ToColumn+'-->'+FromTable+'.'+FromColumn as R1
FROM #RELATIONSHIPSEX) p  
PIVOT  
(  
max(R1) 
FOR FromTable IN  
( Fact1, Fact2, Fact3, Fact4, Fact5 )  
) AS pvt  
ORDER BY ToTable;  


Wednesday, December 30, 2020

How to model unbalanced/rigged hierarchical data - entirely flattened

The sample code below could explain the concept in detail. Please note that this code is SQL Server based.
/*
Input:
--------------------------------------------------------------------------------------------------
0_CEO                                                                   $20
                1_Director                                           $10
                                11_Manager                      $5
                                                111_Worker       $3
                                12_Manager                      $5
                                                121_Worker       $3
                                                122_Worker       $3
                                13_Manager                      $5
Orphan                                                                 $1

Output:
--------------------------------------------------------------------------------------------------
L01         L02                         L03                         L04                                         Name
0_CEO                                                                                                             0_CEO              
0_CEO   1_Director                                                                                       1_Director         
0_CEO   1_Director           11_Manager                                                     11_Manager         
0_CEO   1_Director           12_Manager                                                     12_Manager         
0_CEO   1_Director           13_Manager                                                     13_Manager         
0_CEO   1_Director           12_Manager      121_Worker                         121_Worker         
0_CEO   1_Director           12_Manager      122_Worker                         122_Worker         
0_CEO   1_Director           11_Manager      111_Worker                         111_Worker         
Orphan NULL                     NULL                     NULL                                     Orphan

Comment:
--------------------------------------------------------------------------------------------------
This solution is very flexible when modelled in any BI tool, you can use the hierarchy to navigate data as requested.

Solutions below in SQL
*/


DROP TABLE IF EXISTS Employee;
DROP TABLE IF EXISTS EmployeeHier;
DROP TABLE IF EXISTS Salary;

create table Employee (Name varchar(20))
insert Employee values
('0_CEO')
,('1_Director')
,('11_Manager')
,('111_Worker')
,('12_Manager')
,('121_Worker')
,('122_Worker')
,('13_Manager')
,('Orphan');

create table EmployeeHier (ChildName varchar(20),ParentName varchar(20))
insert EmployeeHier values
('1_Director','0_CEO')
,('11_Manager','1_Director')
,('111_Worker','11_Manager')
,('12_Manager','1_Director')
,('121_Worker','12_Manager')
,('122_Worker','12_Manager')
,('13_Manager','1_Director');

create table Salary (Name varchar(20),Income int)
insert into Salary values
('0_CEO',20)
,('1_Director',10)
,('11_Manager',5)
,('111_Worker',3)
,('12_Manager',5)
,('121_Worker',3)
,('122_Worker',3)
,('13_Manager',5)
,('Orphan',1);

with CTE
     as (select ParentName as ChildName,
                ParentName,
                Cast(isnull(left(ParentName + Replicate(' ', 20), 20), '') as varchar(1000)) FullPath
         from   EmployeeHier
         where  ParentName not in (select distinct ChildName from EmployeeHier)
         union all
         select child.ChildName,
                child.ParentName,
                                                                Cast(CTE.FullPath + isnull(left(child.ChildName+Replicate(' ', 20), 20), '') as varchar(1000)) FullPath
         from   CTE inner join EmployeeHier child on child.ParentName = CTE.ChildName)

select
  rtrim(substring(FullPath, 1,20))  L01
, rtrim(substring(FullPath,21,20))  L02
, rtrim(substring(FullPath,41,20))  L03
, rtrim(substring(FullPath,61,20))  L04
, Name
from (
                                select
                                FullPath
                                ,right(FullPath,20) Name
                                from CTE
)  T
union all
--# Stand alone nodes
select Name -- L01
, null -- L02
, null -- L03
, null -- L04
, name
from Employee
where name not in (       select distinct ChildName from EmployeeHier)
  and name not in ( select distinct ParentName from EmployeeHier );

Wednesday, July 1, 2020

bitemporal vs. scd2 vs. accumulating snapshot

This is a very complicated topic to explore. The document itself is intended to explain the bitemporal model, and compare it with scd2 and accumulating snapshot.
Bitemporal model is clearly explained from the blog https://blog.cloudera.com/blog/2017/05/bi-temporal-data-modeling-with-envelope/
The concept is illustrated as below.


Customer ID1 has credit score changed to 500, 600,700 with different date ranges, respectively. The date range is defined with two columns AS_OF_START_DATE and
,AS_OF_END_DATE. We can get score based on any given date

SELECT * FROM FACT WHERE {Any given date} BETWEEN AS_OF_START_DATE AND AS_OF_END_DATE;

While this data model now captures the best known truth for credit scores over time, we have lost the information about the correction. This might not be important, and in that case the data model could stop here. But it is easy to imagine a scenario where this is important:
1. On Jun 15th credit score is reported as 600.
2. On Jun 18th a manager with access to the report declines a loan application because the credit score wasn’t 650 or above.
3. On Jun 19th the score for November 15th onwards is corrected to 650
4. In December an auditor at a financial regulator sees that the manager denied the loan despite meeting the criteria…

To track how the credit scores changed in the table, not just in the real world, we can also implement the same two field pattern to represent the range of system time that the record was considered true. This allows us to roll back the clock and see the full state of the table at any point in the past, and with only a straightforward filter query. This tracking of both the event and system time ranges is what is known as bi-temporal data modeling.

SELECT * FROM FACT WHERE {Any given date} BETWEEN AS_OF_START_DATE AND AS_OF_END_DATE AND {Any given date} BETWEEN SYSTEM_START_DATE and SYSTEM_END_DATE;

In data warehousing terminology this is a Type 2 slowly changing dimension, where new records are added not just for changes in the real world, but also for changes to the table itself. however, these columns are added to fact side, not in dimension side. and, there is no inferred member concept in bitemporal model.

With bi-temporal data modeling our table would now look like the below representation. We can find the customer credit scores across event time and across system time. For the regulatory scenario the auditor would be able to filter the system start and end dates by the date of the loan denial to see the report as it would have looked to the manager when they made the decision, and so avoiding unnecessary consequences for the bank.

However, it is very challenging to implement Bitemporal model in data warehousing, as it is not easy to aggregate. In my opinion, it is closer to operational side than data ware house side.

In real word, Accumulating snapshots could be a very good choice to resolve similar issues.  Accumulating snapshots work best where the milestones are generally linear and predicable.  If they are not, the design and maintenance will be significantly more complex.
    

Tuesday, April 14, 2020

How to model unbalanced/rigged hierarchical data - vertically extended

The sample code below could explain the concept in detail. Please note that this code is SQL Server based.

/*

Input:
--------------------------------------------------------------------------------------------------
0_CEO                                                                   $20
                1_Director                                           $10
                                11_Manager                      $5
                                                111_Worker       $3
                                12_Manager                      $5
                                                121_Worker       $3
                                                122_Worker       $3
                                13_Manager                      $5
Orphan                                                                 $1

Output:
--------------------------------------------------------------------------------------------------
0_CEO                   0_CEO                   1
0_CEO                   1_Director           2
0_CEO                   11_Manager      3
0_CEO                   111_Worker       4
0_CEO                   12_Manager      3
0_CEO                   121_Worker       4
0_CEO                   122_Worker       4
0_CEO                   13_Manager      3
1_Director           1_Director           1
1_Director           11_Manager      2
1_Director           111_Worker       3
1_Director           12_Manager      2
1_Director           121_Worker       3
1_Director           122_Worker       3
1_Director           13_Manager      2
11_Manager      11_Manager      1
11_Manager      111_Worker       2
111_Worker       111_Worker       1
12_Manager      12_Manager      1
12_Manager      121_Worker       2
12_Manager      122_Worker       2
121_Worker       121_Worker       1
122_Worker       122_Worker       1
13_Manager      13_Manager      1
Orphan                 Orphan                 1


Solutions

Comment:
--------------------------------------------------------------------------------------------------
This solution is very flexible when modelled in any BI tool, you can get  you can get total salary for each individual person and its decedents. For example, select 0_CEO, you will the total salary for whole company accept orphan. Select 11_Manager, you will get total salary of 11_Manager and 111_Worker. 
select H.ChildName, H.ParentName, 2 as LevelFromTop from EmployeeHier H
union ALL
select  CTE.ChildName, H.ParentName, LevelFromTop + 1 as LevelFromTop
from EmployeeHier H
inner join CTE on H.ChildName = CTE.ParentName
where H.ChildName is not NULL )

select 
 T.ParentName
,T.ChildName
,T.LevelFromTop
from
(
                select distinct M.Name as ChildName, M.Name as ParentName, 1 as LevelFromTop from Employee M
                union ALL
                select distinct ChildName, ParentName, LevelFromTop from CTE
) T
inner join Employee M ON (T.ChildName = M.Name)
UNION ALL
select Name,Name,0
from Employee
where Name not in (select ChildName from EmployeeHier)
and  Name not in (select ParentName from EmployeeHier)

order by ParentName, ChildName, LevelFromTop