Yin Solutions: 3. Dimensional Modeling

Showing posts with label 3. Dimensional Modeling. Show all posts

Monday, March 1, 2021

How to extract Power BI model to data warehouse bus matrix

To quickly overview your power bi data model, we can turn the model into a data ware warehouse matrix. This document is intended to provide an easy way with the steps listed below.

Step 1: Open DaxStudio and run DMV statements:

select FromTableID, FromColumnID, ToTableID, ToColumnID from $System.TMSCHEMA_RELATIONSHIPS;
select [ID],[Name] from $System.TMSCHEMA_TABLES;
select [ID],TableID,ExplicitName from $System.TMSCHEMA_COLUMNS where LEN(ExplicitName) < 40;

Step 2: Put each query results into excel spreadsheet to generate SQL queries:

drop table if exists #RELATIONSHIPS;

create table #RELATIONSHIPS (FromTableID int, FromColumnID int, ToTableID int, ToColumnID int);

insert into #RELATIONSHIPS values

(10468,10480,1046,1049),

(10468,10476,1119,1122),

.......

(17766,17773,7789,7797);

drop table if exists #TABLES;

create table #TABLES (ID int, Name varchar(50));

insert into #TABLES values

(1046,'Tabname1'),

(1119,'Tabname2'),

......

(17766,'Fact5');

drop table if exists #COLUMNS;

create table #COLUMNS (ID int, TableID int,ExplicitName varchar(50));

insert into #COLUMNS values

(1049,1046,'Column1'),

(1050,1046,'Column2'),

......

(17867,17766,'ColumnN');

Step 3: Run queries below, you will get matrix

drop table if exists #RELATIONSHIPSEX;

select

R.FromTableID

,TF.Name as FromTable

,R.FromColumnID

,CF.ExplicitName as FromColumn

,R.ToTableID

,TT.Name as ToTable

,R.ToColumnID

,CT.ExplicitName as ToColumn

,1 R

into #RELATIONSHIPSEX

from #RELATIONSHIPS R

left join #TABLES TF on (R.FromTableID=TF.ID)

left join #TABLES TT on (R.ToTableID=TT.ID)

left join #COLUMNS CF on (R.FromColumnID=CF.ID)

left join #COLUMNS CT on (R.ToColumnID=CT.ID);

drop table if exists #MatrixHighlevel;

drop table if exists #MatrixDetail;

SELECT ToTable as Dimension,[Fact1], [Fact2], [Fact3] , [Fact4] , [Fact5]

into #MatrixHighlevel

FROM

(SELECT R,FromTable,ToTable

FROM #RELATIONSHIPSEX) p

PIVOT

(

sum(R)

FOR FromTable IN

( Fact1, Fact2, Fact3, Fact4, Fact5 )

) AS pvt

ORDER BY ToTable;

SELECT ToTable as Dimension,[Fact1], [Fact2], [Fact3] , [Fact4] , [Fact5]

into #MatrixDetail

FROM

(SELECT R,FromTable,ToTable,FromColumn,ToColumn,ToTable+'.'+ToColumn+'-->'+FromTable+'.'+FromColumn as R1

FROM #RELATIONSHIPSEX) p

PIVOT

(

max(R1)

FOR FromTable IN

( Fact1, Fact2, Fact3, Fact4, Fact5 )

) AS pvt

ORDER BY ToTable;

Wednesday, December 30, 2020

How to model unbalanced/rigged hierarchical data - entirely flattened

Check list of all posts

The sample code below could explain the concept in detail. Please note that this code is SQL Server based.

Input:

--------------------------------------------------------------------------------------------------

0_CEO $20

1_Director $10

11_Manager $5

111_Worker $3

12_Manager $5

121_Worker $3

122_Worker $3

13_Manager $5

Orphan $1

Output:

--------------------------------------------------------------------------------------------------

L01 L02 L03 L04 Name

0_CEO 0_CEO

0_CEO 1_Director 1_Director

0_CEO 1_Director 11_Manager 11_Manager

0_CEO 1_Director 12_Manager 12_Manager

0_CEO 1_Director 13_Manager 13_Manager

0_CEO 1_Director 12_Manager 121_Worker 121_Worker

0_CEO 1_Director 12_Manager 122_Worker 122_Worker

0_CEO 1_Director 11_Manager 111_Worker 111_Worker

Orphan NULL NULL NULL Orphan

Comment:

--------------------------------------------------------------------------------------------------

This solution is very flexible when modelled in any BI tool, you can use the hierarchy to navigate data as requested.

Solutions below in SQL

DROP TABLE IF EXISTS Employee;

DROP TABLE IF EXISTS EmployeeHier;

DROP TABLE IF EXISTS Salary;

create table Employee (Name varchar(20))

insert Employee values

('0_CEO')

,('1_Director')

,('11_Manager')

,('111_Worker')

,('12_Manager')

,('121_Worker')

,('122_Worker')

,('13_Manager')

,('Orphan');

create table EmployeeHier (ChildName varchar(20),ParentName varchar(20))

insert EmployeeHier values

('1_Director','0_CEO')

,('11_Manager','1_Director')

,('111_Worker','11_Manager')

,('12_Manager','1_Director')

,('121_Worker','12_Manager')

,('122_Worker','12_Manager')

,('13_Manager','1_Director');

create table Salary (Name varchar(20),Income int)

insert into Salary values

('0_CEO',20)

,('1_Director',10)

,('11_Manager',5)

,('111_Worker',3)

,('12_Manager',5)

,('121_Worker',3)

,('122_Worker',3)

,('13_Manager',5)

,('Orphan',1);

with CTE

as (select ParentName as ChildName,

ParentName,

Cast(isnull(left(ParentName + Replicate(' ', 20), 20), '') as varchar(1000)) FullPath

from EmployeeHier

where ParentName not in (select distinct ChildName from EmployeeHier)

union all

select child.ChildName,

child.ParentName,

Cast(CTE.FullPath + isnull(left(child.ChildName+Replicate(' ', 20), 20), '') as varchar(1000)) FullPath

from CTE inner join EmployeeHier child on child.ParentName = CTE.ChildName)

select

rtrim(substring(FullPath, 1,20)) L01

, rtrim(substring(FullPath,21,20)) L02

, rtrim(substring(FullPath,41,20)) L03

, rtrim(substring(FullPath,61,20)) L04

, Name

from (

select

FullPath

,right(FullPath,20) Name

from CTE

) T

union all

--# Stand alone nodes

select Name -- L01

, null -- L02

, null -- L03

, null -- L04

, name

from Employee

where name not in ( select distinct ChildName from EmployeeHier)

and name not in ( select distinct ParentName from EmployeeHier );

Wednesday, July 1, 2020

bitemporal vs. scd2 vs. accumulating snapshot

Check list of all posts

This is a very complicated topic to explore. The document itself is intended to explain the bitemporal model, and compare it with scd2 and accumulating snapshot.
Bitemporal model is clearly explained from the blog https://blog.cloudera.com/blog/2017/05/bi-temporal-data-modeling-with-envelope/
The concept is illustrated as below.

Customer ID1 has credit score changed to 500, 600,700 with different date ranges, respectively. The date range is defined with two columns AS_OF_START_DATE and
,AS_OF_END_DATE. We can get score based on any given date

SELECT * FROM FACT WHERE {Any given date} BETWEEN AS_OF_START_DATE AND AS_OF_END_DATE;

While this data model now captures the best known truth for credit scores over time, we have lost the information about the correction. This might not be important, and in that case the data model could stop here. But it is easy to imagine a scenario where this is important:
1. On Jun 15th credit score is reported as 600.
2. On Jun 18th a manager with access to the report declines a loan application because the credit score wasn’t 650 or above.
3. On Jun 19th the score for November 15th onwards is corrected to 650
4. In December an auditor at a financial regulator sees that the manager denied the loan despite meeting the criteria…

To track how the credit scores changed in the table, not just in the real world, we can also implement the same two field pattern to represent the range of system time that the record was considered true. This allows us to roll back the clock and see the full state of the table at any point in the past, and with only a straightforward filter query. This tracking of both the event and system time ranges is what is known as bi-temporal data modeling.

SELECT * FROM FACT WHERE {Any given date} BETWEEN AS_OF_START_DATE AND AS_OF_END_DATE AND {Any given date} BETWEEN SYSTEM_START_DATE and SYSTEM_END_DATE;

In data warehousing terminology this is a Type 2 slowly changing dimension, where new records are added not just for changes in the real world, but also for changes to the table itself. however, these columns are added to fact side, not in dimension side. and, there is no inferred member concept in bitemporal model.

With bi-temporal data modeling our table would now look like the below representation. We can find the customer credit scores across event time and across system time. For the regulatory scenario the auditor would be able to filter the system start and end dates by the date of the loan denial to see the report as it would have looked to the manager when they made the decision, and so avoiding unnecessary consequences for the bank.

However, it is very challenging to implement Bitemporal model in data warehousing, as it is not easy to aggregate. In my opinion, it is closer to operational side than data ware house side.

In real word, Accumulating snapshots could be a very good choice to resolve similar issues. Accumulating snapshots work best where the milestones are generally linear and predicable. If they are not, the design and maintenance will be significantly more complex.

Tuesday, April 14, 2020

How to model unbalanced/rigged hierarchical data - vertically extended

Check list of all posts

The sample code below could explain the concept in detail. Please note that this code is SQL Server based.

/*

Input:

--------------------------------------------------------------------------------------------------

0_CEO $20

1_Director $10

11_Manager $5

111_Worker $3

12_Manager $5

121_Worker $3

122_Worker $3

13_Manager $5

Orphan $1

Output:

--------------------------------------------------------------------------------------------------

0_CEO 0_CEO 1

0_CEO 1_Director 2

0_CEO 11_Manager 3

0_CEO 111_Worker 4

0_CEO 12_Manager 3

0_CEO 121_Worker 4

0_CEO 122_Worker 4

0_CEO 13_Manager 3

1_Director 1_Director 1

1_Director 11_Manager 2

1_Director 111_Worker 3

1_Director 12_Manager 2

1_Director 121_Worker 3

1_Director 122_Worker 3

1_Director 13_Manager 2

11_Manager 11_Manager 1

11_Manager 111_Worker 2

111_Worker 111_Worker 1

12_Manager 12_Manager 1

12_Manager 121_Worker 2

12_Manager 122_Worker 2

121_Worker 121_Worker 1

122_Worker 122_Worker 1

13_Manager 13_Manager 1

Orphan Orphan 1

Solutions

Comment:

--------------------------------------------------------------------------------------------------

This solution is very flexible when modelled in any BI tool, you can get you can get total salary for each individual person and its decedents. For example, select 0_CEO, you will the total salary for whole company accept orphan. Select 11_Manager, you will get total salary of 11_Manager and 111_Worker.

select H.ChildName, H.ParentName, 2 as LevelFromTop from EmployeeHier H

union ALL

select CTE.ChildName, H.ParentName, LevelFromTop + 1 as LevelFromTop

from EmployeeHier H

inner join CTE on H.ChildName = CTE.ParentName

where H.ChildName is not NULL )

select

T.ParentName

,T.ChildName

,T.LevelFromTop

from

(

select distinct M.Name as ChildName, M.Name as ParentName, 1 as LevelFromTop from Employee M

union ALL

select distinct ChildName, ParentName, LevelFromTop from CTE

) T

inner join Employee M ON (T.ChildName = M.Name)

UNION ALL

select Name,Name,0

from Employee

where Name not in (select ChildName from EmployeeHier)

and Name not in (select ParentName from EmployeeHier)

order by ParentName, ChildName, LevelFromTop