Tuesday, January 1, 2019

How to model Cognos FM to join two fact tables at different granularity with not shared dimension(s)

This document is intended to resolve one generic percentage problem. The numerator is from one fact table, while denominator is from another fact table.  The numerator fact table has more dimensions then the denominator fact table. In other word, some dimensions in the numerator fact table do not exist in the denominator fact table.  These dimensions can be considered as not confirmed. This document will demonstrate a solution to provide an effective Cognos FM model, which will allow report authors to generate report by simply drag and drop, using relational model or DMR model, including list, crosstab and chart.

Problem
In order to explain this issue in detail, we create a sample below.
C3 is only used for FACT1, or the numerator fact table, it doesn't exist in FACT2, or the denominator fact table. Both fact tables share C1 and C2 only.
The requirement is shown from sample result data: When dragging dimension C3, M1 and M2, we should get the total of FACT2 as 300 as denominator. the overall total is 300 as well.
By breaking down further with C1 and C2, along with C3, M1 from FACT1 is clearly summed as numerator based on C1,C2 and C3, while M2 from FACT2 is only based on C1 and C2.
It is obviously that we should mot build report simply based on each facts with many joins on report side
 
Analysis
1. With respect to data warehousing design, both fact tables should be defined as separated tables, as both fact tables are at different granularity. in addition, joining both fact tables with different granularity  is a big no-no in term of modelling.
 
2. When we think about different granularity, we naturally bring the determinant concept into consideration. However, determinant is NOT needed. The determinant is introduced to avoid double counting. When sales is at daily based, plan is at monthly basis, we generated reports based on month. We could get about 30 times more plan data into report when joining with time dimension, if the determinant is not defined on time dimension. The determinant on time dimension forces the join at month level, and only single record exists for each month with SQL group by. However, we never report with plan data on daily basis. in other word, we use determinant to enable us to generate report at high level, not at low level.  The current case is that the dimension (or C3) does not even exist in denominator fact table. We will talk about this issue further later.
 
3. we cannot use UNION to bring both datasets together   

Data looks like below

This is a problem, as data cannot be aligned in report. Behind the scene, C3 doesn't exist in FACT2, therefore it is naturally to show empty cell. The only way we can bring data together from two different tables is either join or stitch ( full outer join)
 
4. we cannot use JOIN to bring both dataset together


Report

Even we apply semi aggregate function, we still cannot generate the current report.

5. We can use STICTCH QUARY to bring both dataset together. This sample is not a typical sample, as C3 is not shared dimension. To resolve this problem, we can “FAKE” an extra column for FACT2 with value “total”. We can also “FAKE” an extra column for C3 with value “total”. Doing this way will bring all three dimensions as confirmed and shared dimensions, where the relationship between C3 and FACT2 is connect at “Total” level. This relationship can be further adjusted by determinant to avoid the double count.

Solution
Below is the FM design screenshot, where we get 3 dimensions are designed and turn the design as multiple fact tables with different granularities.



Create a faked column as C3_Total   in FACT2
[Data Source Layer].[ FACT2]: select C1, C2, M2, 'Total' as C3_Total  from [DATASOURCE1].FACT2

Create a view as V_C3  with a fake column Total, the reason we create this view is because only table or view can be supported for Cognos determinant.  
[Data Source Layer].[ C3]: select *  from [DATASOURCE1].V_C3
CREATE OR REPLACE VIEW SCHEMEX.V_C3 AS SELECT 'Total' AS TOTAL, SCHEMAX.C3.ID FROM C3



Result analysis
Case 1:
Report:

 SQL:
 select
 D2.ID ID ,
 D2.M1 M1 ,
 D3.M2 M2
from
(
 select
  C3.ID ID ,
  sum(FACT1.M1) M1
 from SCHEMAX.C3 C3, SCHEMAX.FACT1 FACT1
 where FACT1.C3 = C3.ID group by C3.ID
) D2,
(
 select
  sum(M2) M2
 from SCHEMAX.FACT2
  having count(*) > 0
) D3


 Data:


 Case 2:Report:

 SQL:
select
 D2.ID ID ,
 D2.M1 M1 ,
 D3.M2 M2 ,
 (D2.rc * 1.0e0) / nullif(D3.rc, 0) C____M1__M2_ ,
 min(D2.Summary_M1_) over () Summary_M1_ ,
 min(D3.Summary_M2_) over () Summary_M2_ ,
 (min(D2.rc5) over () * 1.0e0) / nullif(min(D3.rc) over (), 0) Summary____M1__M2__
from (
  select
   T0.C0 ID ,
   T0.C1 M1 ,
   T0.C1 rc
   , sum(T0.C1) over () Summary_M1_ ,
   sum(T0.C1) over () rc5
   from (
    select C3.ID C0 , sum(FACT1.M1) C1
    from SCHEMAX.C3 C3, SCHEMAX.FACT1 FACT1
    where FACT1.C3 = C3.ID group by C3.ID
   ) T0
 ) D2,
 (
  select
   distinct T0.C0 M2 ,
   T0.C0 rc ,
   T0.C0 Summary_M2_
   from
  (
   select sum(M2) C0 from SCHEMAX.FACT2
  ) T0
 ) D3


Data:

 Case 3:Report:
 SQL:
select
 (coalesce(D2.ID1, D3.ID1)) ID1 ,
 (coalesce(D2.ID2, D3.ID2)) ID2 ,
 D2.ID ID ,
 D2.M1 M1 ,
 D3.M2 M2
from
 (
  select
   C1.ID ID1 ,
   C2.ID ID2 ,
   C3.ID ID ,
   sum(FACT1.M1) M1
  from SCHEMAX
.C1 C1,
   SCHEMAX.C2 C2,
   SCHEMAX.V_C3 C3,
   SCHEMAX.FACT1 FACT1
  where FACT1.C1 = C1.ID
   and FACT1.C2 = C2.ID
   and FACT1.C3 = C3.ID
  group by C3.ID, C1.ID, C2.ID
 ) D2
 FULL OUTER JOIN
 (
  select C1.ID ID1 ,
   C2.ID ID2 ,
   sum(FACT2.M2) M2
  from SCHEMAX.C1 C1,
   SCHEMAX.C2 C2,
   SCHEMAX.FACT2 FACT2,
   (
    select C3.TOTAL TOTAL
    from SCHEMAX.V_C3 C3
    group by C3.TOTAL
   ) C3
  where C1.ID = FACT2.C1
   and C2.ID = FACT2.C2
   and C3.TOTAL = 'Total'
  group by C1.ID, C2.ID
 ) D3
 on D2.ID1 = D3.ID1 and D2.ID2 = D3.ID2;


Data:

 
  Case 4:Report:












SQL:

select
  (coalesce(D2.ID1, D3.ID1)) ID1 ,
  (coalesce(D2.ID2, D3.ID2)) ID2 ,
  D2.ID ID ,
  D2.M1 M1 ,
  D3.M2 M2
from
 (
   select
    C1.ID ID1 ,
    C2.ID ID2 ,
    C3.ID ID ,
    sum(FACT1.M1) M1
   from SCHEMAX.C1 C1,
    SCHEMAX.C2 C2,
    SCHEMAX.V_C3 C3,
    SCHEMAX.FACT1 FACT1
   where FACT1.C1 = C1.ID
    and FACT1.C2 = C2.ID
    and FACT1.C3 = C3.ID
   group by C3.ID, C1.ID, C2.ID
 ) D2
 FULL OUTER JOIN
 (
   select C1.ID ID1 ,
    C2.ID ID2 ,
    sum(FACT2.M2) M2
   from SCHEMAX.C1 C1,
    SCHEMAX.C2 C2,
    SCHEMAX.FACT2 FACT2,
    (
     select C3.TOTAL TOTAL
     from SCHEMAX.V_C3 C3
     group by C3.TOTAL
    ) C3
   where C1.ID = FACT2.C1
    and C2.ID = FACT2.C2
    and C3.TOTAL = 'Total'
   group by C1.ID, C2.ID
 ) D3
 on D2.ID1 = D3.ID1 and D2.ID2 = D3.ID2;

Data:

 
NOTE (To be determined)
1. The solution above does not work against Crosstab, as the stich query can not be generated.
2. We can not leave C3 not join with Fact2. In case when C3 is needed in reports, the stitched query can only be generated only if C3 in conformed dimension.
3. However, determinant can not make crosstab work.
4. Actually, determinant is not needed in this case, as we need to report at lower level, not like the most of cases where higher level is needed.
5. This document resolves actually the problem like that sales is at daily level, plan is at monthly level, report need to at daily level, but consider whole month plan data as denominator.

No comments:

Post a Comment