Wednesday, April 24, 2013

Cognos report performance tuning - Slice a single big fact table into multiple small measure oriented tables – “horizontal partition”




Breaking a single fact table into multiple small fact tables with their corresponded measures could dramatically improve performance, all these small fact tables shares same conformed dimensions. This approach is applicable for both relational and DMR with compatible mode, and it does not request to change report specification.


Best suitable case: the fact table is very big table with many different measures; and each measure is associated with part of records in the table.


To demonstrate this idea, assume there is a table F with 100 million records with measure M1, M2, and M3, associated with dimensions D1, D2 and D3. The solution is to break F into three different fact tables F1, F2 and F3, each fact table associated with their own measures M1, M2 and M3, respectively. For the sake of simplicity, assume each fact table has only 3 million records for F1, 3 million records for F2, and 4 Million records for F3


F – 100 millions records with M1, M2 and M3

F1 – 30 millions records with M1

F1 – 30 millions records with M2
F1 – 40 millions records with M3


Current Model

Table structure
CREATE TABLE [dbo].[F](
[D1] [int] NOT NULL,
[D2] [int] NOT NULL,
[D3] [int] NOT NULL,
[M1] [int] NULL,
[M2] [int] NULL,
[M3] [int] NULL
)
GO
CREATE TABLE [dbo].[D3](
[Key] [int] NULL
)
GO
CREATE TABLE [dbo].[D2](
[Key] [int] NULL
)
GO
CREATE TABLE [dbo].[D1](
[Key] [int] NULL
)

Framework manager
The framework manager at database tier and business tier is designed as follows
Queries
Case 1 query data for M1, query is generated as follows
SELECT
   F.D1 AS D1,
   F.D2 AS D2,
   F.D3 AS D3,
   SUM(F.M1) AS M1
FROM
   dbo.F F
GROUP BY
   F.D1,
   F.D2,
   F.D3

Case 2 query data for M1 and M2, query is generated as follows
SELECT
   F.D1 AS D1,
   F.D2 AS D2,
   F.D3 AS D3,
   SUM(F.M1) AS M1,
   SUM(F.M2) AS M2
FROM
   dbo.F F
GROUP BY
   F.D1,
   F.D2,
   F.D3

Case 3 query data for M1, M2 and M3, query is generated as follows
SELECT
   F.D1 AS D1,
   F.D2 AS D2,
   F.D3 AS D3,
   SUM(F.M1) AS M1,
   SUM(F.M2) AS M2,
   SUM(F.M3) AS M3
FROM
   dbo.F F
GROUP BY
   F.D1,
   F.D2,
   F.D3

Regardless of what measure(s) selected in report, database needs to scan whole 100 million records.


New Model

Table structure
CREATE TABLE [dbo].[F1](
[D1] [int] NOT NULL,
[D2] [int] NOT NULL,
[D3] [int] NOT NULL,
[M1] [int] NULL
)
Go
CREATE TABLE [dbo].[F2](
[D1] [int] NOT NULL,
[D2] [int] NOT NULL,
[D3] [int] NOT NULL,
[M2] [int] NULL
)
Go
CREATE TABLE [dbo].[F3](
[D1] [int] NOT NULL,
[D2] [int] NOT NULL,
[D3] [int] NOT NULL,
[M3] [int] NULL
)
GO
CREATE TABLE [dbo].[D3](
[Key] [int] NULL
)
GO
CREATE TABLE [dbo].[D2](
[Key] [int] NULL
)
GO
CREATE TABLE [dbo].[D1](
[Key] [int] NULL
)

Framework manager
The framework manager at database tier and business tier is designed as follows
Please note that only F will be exposed, while F1, F2 and F3 is hidden.


Queries

Case 1  query data for M1, query is generated as follows
SELECT
   D1.Key AS D1,
   D2.Key AS D2,
   D3.Key AS D3,
   SUM(F1.M1) AS M1
FROM
   dbo.D1 D1
       INNER JOIN dbo.F1 F1
       ON D1.Key = F1.D1
           INNER JOIN dbo.D2 D2
           ON D2.Key = F1.D2
               INNER JOIN dbo.D3 D3
               ON D3.Key = F1.D3
GROUP BY
   D1.Key,
   D2.Key,
   D3.Key

Only F1 is used, meaning that only 30 million records will be scanned

Case 2  query data for M1 and M2, query is generated as follows
SELECT
   COALESCE( FS1.D1, FS2.D1) AS D1,
   COALESCE( FS1.D2, FS2.D2) AS D2,
   COALESCE( FS1.D3, FS2.D3) AS D3,
   FS1.M1 AS M1,
   FS2.M2 AS M2
FROM
   (
   SELECT
       D1.Key AS D1,
       D2.Key AS D2,
       D3.Key AS D3,
       SUM(F1.M1) AS M1
   FROM
       dbo.D1 D1
           INNER JOIN dbo.F1 F1 ON D1.Key = F1.D1
               INNER JOIN dbo.D2 D2 ON D2.Key = F1.D2
                   INNER JOIN dbo.D3 D3  ON D3.Key = F1.D3
   GROUP BY
       D1.Key,
       D2.Key,
       D3.Key
   ) FS1
       FULL OUTER JOIN
       (
       SELECT
           D1.Key AS D1,
           D2.Key AS D2,
           D3.Key AS D3,
           SUM(F2.M2) AS M2
       FROM
           dbo.D1 D1
               INNER JOIN dbo.F2 F2 ON D1.Key = F2.D1
                   INNER JOIN dbo.D2 D2 ON D2.Key = F2.D2
                       INNER JOIN dbo.D3 D3  ON D3.Key = F2.D3
       GROUP BY
           D1.Key,
           D2.Key,
           D3.Key
       ) FS2
       ON
           FS1.D1 = FS2.D1 AND
           FS1.D2 = FS2.D2 AND
           FS1.D3 = FS2.D3

Only F1 and F2 are used, meaning that only 60 million records will be separately scanned

Case 3  query data for M1, M2 and M3, query is generated as follows
SELECT
   COALESCE( FS1.D1, FS2.D1, FS3.D1) AS D1,
   COALESCE( FS1.D2, FS2.D2, FS3.D2) AS D2,
   COALESCE( FS1.D3, FS2.D3, FS3.D3) AS D3,
   FS1.M1 AS M1,
   FS2.M2 AS M2,
   FS3.M3 AS M3
FROM
   (
   SELECT
       D1.Key AS D1,
       D2.Key AS D2,
       D3.Key AS D3,
       SUM(F1.M1) AS M1
   FROM
       dbo.D1 D1
           INNER JOIN dbo.F1 F1 ON D1.Key = F1.D1
               INNER JOIN dbo.D2 D2 ON D2.Key = F1.D2
                   INNER JOIN dbo.D3 D3 ON D3.Key = F1.D3
   GROUP BY
       D1.Key,
       D2.Key,
       D3.Key
   ) FS1
       FULL OUTER JOIN
       (
       SELECT
           D1.Key AS D1,
           D2.Key AS D2,
           D3.Key AS D3,
           SUM(F2.M2) AS M2
       FROM
           dbo.D1 D1
               INNER JOIN dbo.F2 F2  ON D1.Key = F2.D1
                   INNER JOIN dbo.D2 D2 ON D2.Key = F2.D2
                       INNER JOIN dbo.D3 D3 ON D3.Key = F2.D3
       GROUP BY
           D1.Key,
           D2.Key,
           D3.Key
       ) FS2
       ON
           FS1.D1 = FS2.D1 AND
           FS1.D2 = FS2.D2 AND
           FS1.D3 = FS2.D3
           FULL OUTER JOIN
           (
           SELECT
               D1.Key AS D1,
               D2.Key AS D2,
               D3.Key AS D3,
               SUM(F3.M3) AS M3
           FROM
               dbo.D1 D1
                   INNER JOIN dbo.F3 F3 ON D1.Key = F3.D1
                       INNER JOIN dbo.D2 D2  ON D2.Key = F3.D2
                           INNER JOIN dbo.D3 D3  ON D3.Key = F3.D3
           GROUP BY
               D1.Key,
               D2.Key,
               D3.Key
           ) FS3
           ON
               COALESCE( FS2.D1, FS1.D1) = FS3.D1 AND
COALESCE( FS2.D2,  FS1.D2) = FS3.D2 AND
               COALESCE(  FS2.D3, FS1.D3) = FS3.D3

Even three tables F1, F2 and F3 are scanned, the performance should be much better than scan the same table with 100 million records.

You can apply this approach for DMR with compatible mode as well, see Framework manager below, only F and three dimensions D1, D2 and D3 are exposed.


The major advantage of approach above is that there is no need to report specification. The changes occur only in database layer and business layer. This approach can be even more effective when fact is based on query with prompt.

2 comments:

  1. Why would you have 3 fact tables if all your 3 measures share the same conformed dimensions?!
    Every row in your big fact will have non-zero values for ALL 3 measures (so you'd have to scan the same rowset for any calculation)
    This approach would have any merits only if the measures are completely independent and have non-conformed dimensions...

    ReplyDelete
  2. HI,
    I have 2 facts in my model with few dimension like year and etc.. i would take example of 1 dimension Year.
    Year Dimension has Key, same keys are mapped to fact1 and fact2. I have created 1:1 to 1:n joins between year-fact1 and year to fact2.
    Now i have created the Star Schema Grouping making the Q1 as Query Subject which has Shortcuts of Year and Fact1 table. And Q2 query Subject with shortcuts of Year and Fact2.
    Now out of Q1 query Subject i have created an hierarchy called Years. With Levels Year-to-Quarter-to-Month
    Then i have created 2 Measure Dimension.
    M1 from Q1 which has Costs from fact1 table and
    M2 from Q2 which has Cost from fact2 table.
    i created a cross tab with Years at raw and Cost from M1 at column. i am Getting Costs for all years fine.
    But when i create a Drill Through on Cost, passing Year as data element in drill through parameters. Also in the target report i am slicing the Year level from same hierarchy. but the only difference is i am using Cost measure from M2 measure dimension now in target report.
    This gives me the errors. like to enable local processing at FM level whcih i dont want to do.
    So can i use the same hierarchy for slicing both measure dimension. or i have to create 2 separate hierarchy for each measure dimensions. If i have to create 2 separate hierarchy, what i will passing in drill through parameters and what can be my slicer expression in target report.
    and if the same hierarchy i can use for both M1 and M2, then what should i pass and what can be my slicer expression in target report so i dont receive the errors.
    Thank so much in advance.
    You can reach out to me on dmk.3678@gmail.com

    ReplyDelete