Wednesday, May 23, 2012

How to simplify and centralize date logic in ETL and Report


Context
Solution options
Dynamically figure out current load date in DMR model
Dynamically figure out current load date in relational model
Dynamically setup current load date using hidden prompt
Add current load date to fact table
Add current load date to time dimension
Use parameters map to simplify and centralize date logic in ETL and Report for relational model
Use parameters map to simplify and centralize date logic in ETL and Report for DMR model


Context

Date function is used everywhere in report development. There are many ways to the last available load date, month or quarter. This article is to explore all different solution options, and then to compare them to figure out the best option, which is to simplify and centralize date logic in ETL and Report.  The approach can dramatically make code easy to read and improve the performance overall

Solution options

Dynamically figure out current load date in DMR model


Thanks to IBM Cognos Proven Practices: IBM Cognos BI – Using Dimensional Functions to Determine Current Period, the solution is to use following statement:

item(tail(filter(members([sales_and_marketing].[Time].[Time].[Month]),
tuple([Revenue], currentMember([sales_and_marketing].[Time].[Time]))
is not null), 1), 0)

Once getting this reference date, then MDX dimensional functions of Date and Family can be used to achieve all different periods, such as last month, or 13 months.  This solution has a very big performance issue in case when the fact table is big.

Dynamically figure out current load date in relational model


It is the similar concept to DMR, but the syntax looks much easier.

maximum ([Sales (query)].[Time (close date)].[Date (close date)]) with filter [Sales (query)].[Sales].[Revenue] > 0

Business Date/Time Functions can be used to get all requested periods in report. Again,  It has the same problem with performance.

Dynamically setup current load date using hidden prompt


To avoid the filter data from fact table, it is possible to use hidden prompt to get current load date from dimension table, then pass it to the main report query. Using DMR as sample, all steps are listed as follows:
  1. Define Currency day in main query
[FACT TABLE NAMESPACE].[Report Date].[Fiscal Calendar].[Day]->?Last report date?

  1. define a value Prompt using parameter Last report date
  2. Specify the name as PromptLastReportDate and select UI as List Box
  3. Make prompt query to use global filter to have only last report date selected
  4. Drag HTML Item besides PromptLastReportDate
  5. Add Java script as below to automatically select last report date
<Script>
var form = getFormWarpRequest();
var listB  = form._oLstChoicesPromptLastReportDate;       
listB.options[0].selected=true;
</Script>

  • Make the Prompt  PromptLastReportDate hidden
  • Run report, parameter last report date will be automatically given behind the scene, the performance is dramatically improved.

As JavaScript is a client app, that means this JavaScript will only work when user run ad hoc report. This solution won’t work against scheduled report.

Add current load date to fact table


Instead of dynamically determine the current load date, a new current load date column can be added to fact table. Below is a sample to use date key:

Filter:
[Namespace].[FactTable].[CurrentDateKey] = [DLD OBU].[FactTable].[DateKey]
SQL:
select  datekey, currentDatekey,* FROM FactTable where datekey = currentDatekey

It is easy to get all different filters.  However, this solution has resolved dynamic issue, but still has problem with performance and space for both report and ETL. When doing a simple incremental load, you still need to update the current date for the whole table, which will result in a very poor performance.


Add current load date to time dimension


To overcome the drawback of adding current load to fact table, we can add current load date to time dimension. . Below is a sample to use date key:

Filter:
[Time dimension].[date] between DATEADD( DAY, -30, [Time dimension].[Current load date] ) and [Time dimension].[Current load date]

SQL:
select * from [Time dimension] where [date] between DATEADD( DAY, -30, [Current load date]) and [Current load date]


This solution has following issues:
  • As Time dimension is shared in whole data warehousing, time dimension is supposed to be stable. However, this solution requires adding new current load date column for all fact tables, or a group of fact tables. This change will result in framework manager change as well.
  • In addition, it is difficult to use in query studio. When drag fact table into work space, you need to join time dimension and fact table to get merged fact, in order to make data model user friendly.


Use parameters map to simplify and centralize date logic in ETL and Report for relational model


Two tables are involved to current load date:
  • Incremental load control table for ETL
  • Report control table to save current load date

During the ETL load process, Report control table will be updated in last step from Incremental load control table.


1. Create a generic table to host current load date

Report control table is CURRENT_LOAD_STATUS, defined as follows

CREATE TABLE [dbo].[CURRENT_LOAD_STATUS](
[ParameterName] [varchar](50) NOT NULL,
[ParameterDescription] [varchar](100) NULL,
[IntValue] [int] NULL,
[StringValue] [varchar](100) NULL,
[DateValue] [datetime] NULL,
[FloatValue] [float] NULL,
[CreateDate] [datetime] NULL,
[ModifiedDate] [datetime] NULL,
CONSTRAINT [PK_CTRL_Status] PRIMARY KEY CLUSTERED
(
[ParameterName] ASC
))

Assume data value is July 20, 2007

2. Create query subject

SELECT
ParameterName ,
ParameterDescription ,
IntValue ,
StringValue ,
DateValue ,
FloatValue ,
CONVERT(varchar(10),DateValue,120) as DateString,
T.DAY_KEY as DateKey
FROM
dbo.CURRENT_LOAD_STATUS S inner join gosales.TIME_DIMENSION T on (S.DateValue = T.DAY_DATE)

The reason of using get DateString is that there is a issue to get date value. Certainly, you can use dateKey as integer for filter in case when you can use integer for fact and dimension tables.

3. Define parameter map based on query subject



4. Define Calculations and filters

Current Sales Load Date:
cast(#$CurrentLoadStatus_Parameter_Map{'SALES'}#,date)

13 month Sales
[Sales (query)].[Sales].[Order date]   between  _add_months ([Filters and calculations].[Current Sales Load Date],-13  ) and [Filters and calculations].[Current Sales Load Date]

5. Publish the calculation, along with Query subject (Hide column)
It is very important to publish query object, otherwise the current load date is the default value.

6. Use it in report studio and query studio
The key is to have Current Sales Load Date.  From this value you can derive all other dates with business date/time function. Then you can get all kinds of filters you want by applying such calculations.


We can simply use calculation to get single value for report and prompt control, this approach makes report development much easier and the solution is extendable.


Use parameters map to simplify and centralize date logic in ETL and Report for DMR model


Another sample is to use DMR data model. The following step uses Break Aggregate table to demonstrate the solution.
 
1. Create query subject


2. Define parameter map based on query subject

3. Define Calculation in dimension view

4. Move it to presentation view

5. Publish the calculation, along with Query subject (Hide column)

6. Use Current report date in daily break report

1 comment:

  1. Thanks for your detailed explanation Yin.

    Is it possible to create a parameter map or any way in FM with all current month data in it.

    eg: i have data in one of my fact table till 201208--current month. our user wants to enter different years with same month .(i.e. month should be constant when he filter for 2011 he should see 201108 etc.
    i.e
    current mnth,prior year same mnth,same mnth any year he filters.

    Your help would really be appriciated.

    ReplyDelete