Tuesday, May 22, 2012

How to achieve a comprehensive BI solution using Cognos with SQL server


Context
Solution overview
Implementation
Define a flexible data model
Define SQL Alert to trigger ETL Job
Write user input into database for unmapped dimensions
Turn Cognos BI reports into “Web forms” without using JavaScript
Activate ETL job via Cognos report
Control detail ETL load steps
Lesson learned…
Work with separated ETL server and database server
Use dynamic schedule to resolve SQL alert weakness

Context


As known, Cognos BI can be used for report, while SQL server integration services or SSIS can be used for ETL. However, this implementation may not provide a comprehensive BI solution, as there is another component to be requested, which is data load management to make sure that data is correctly loaded. This requirement can be considered as follows:
  • Data should be automatically loaded based on schedule and alert. For example, when source data from external or internal is ready, then ETL should automatically start to load.
  • Data should be fully loaded to reflect real status, where any exceptions, such as for “late arriving dimensions”, are requested to be handled by end users
  • Data load process should be handled as a workflow process, and end users can be guided to go through the whole process.
  • Data load should be monitored, in order to give end users an overview of data availability.

There are many ways to resolve these issue based on the company‘s IT infrastructure. The popular solution is to use scheduler software, workflow software, master data services software and with some web development. However, there are many challenges with these tools in this context:
  • A simple schedule tool can’t satisfy the requirements above, as system should allow user to “act”, mainly handle rejected transactions
  • A sophisticated workflow can do the job but it is too expensive, and not practical at all
  • Master data management can resolve rejected records issue, but it is very complicated and can’t resolve workflow problem
  • Contributor can be used to write data back to database, but can’t resolve workflow problem

This article is intended to explore an affordable and innovative solution using Cognos and SQL server only, where scheduler, workflow, master data services and web development are not needed.

Solution overview


The solution is to leverage Cognos BI report and SSIS as follows:

  • Use SQL alert to automate load process when source data is ready; the SQL jobs can be used to schedule job.
  • Make Cognos highly interactive to provide user an interface like “web form” , where user can resolve map issue for all dimensions.
  • Make Cognos report to show the load status, and then give user an ability to control load process from report.
  • Make ETL log all statuses, where these status can be displayed on report.

The screenshot below illustrates the main idea. Please  download report specification for reference in detail



Please note that this interface is only an idea to demonstrate a general concept. In real implementation, this idea can be customized for any data load management. As illustrated, this interface can perform following tasks:

  1. Track all jobs status whether it is from manual load or automatic load.
  2. Enforce the workflow logic to make user to follow all steps. The goal is to finish all steps. Users can use actions to achieve their goal by applying actions.
  3. User can act all situations, like operation dashboard; for example, if you need to audit data and then promote, then user is guided to validate data and promote data; If there are data exceptions for dimensions, user will be guided to handle exceptions.


The diagram below is the architect diagram



Implementation


All implementation challenges are listed as follows:

Define a flexible data model

It is preferred to define a framework (or metadata) that can be used to handle all similar problems. Data flow is highly customized but not complicated, or all tasks within workflow are sequential. Main tables are Workflow, Job, status, activity and incremental load control table.

Define SQL Alert to trigger ETL Job

To enable automatic load, three steps are defined:
  1. A file or files are found in source data directory by SQL trigger
EXEC msdb.dbo.sp_add_alert @name=N'FileArrival', ...

  1. Trigger will start a SQL job (job_SP) with stored procedure, then this stored procedure calls another SQL job for real ETL (job_etl)
  1. This SQL job calls ETL using SSIS package

Write user input into database for unmapped dimensions

In case when there are rejected transactions, user is requested to handle it. The idea is to find a way to allow business user not only to query rejected records but also to act upon it. From Cognos point of view, we need to dramatically leverage stored procedure to update map table and dimension table.  

Turn Cognos BI reports into “Web forms” without using JavaScript


To provide user interface for exception handling and data workflow concept, we need to figure out a way to turn report into “Web form”. The idea is to leverage Cognos key concepts and their relationships, including parameters, variables, conditional blocks, drill through and portal pages. The concept to turn Cognos BI reports into “Web forms” is illustrated below. (this topic will be discussed in detail in other post)
  1. Define a hidden value prompt as pBlock using static selection as Block 1, Block 2, … Block i,… Block N
  2. Define a conditional variable as vBlock = ParamValue(‘pBlock’), and therefore variable value is   Block 1, Block 2, … Block i,… Block N
  3. Define report as conditional Block 1, Block 2, Block 3, … with conditional variable vBlock assigned.
  4. Define your action as drill through for all three cases including singleton, list and crosstab
  5. Parameter pass data item pBlock, and therefore control what block to display.
  6. If you want to go other page, then use portlet with global drill though channel.

There will be four cases, including drill through from 1) singleton, 2) list,  3) crosstab rand 4) workspace





Activate ETL job via Cognos report

There are three steps:
  1. When drilling through from report, a stored procedure (usp_RunETLjob) is called
  2. This store procedure calls SQL job (called Job_etl)
  3. Then this job calls ETL using SSIS package

Control detail ETL load steps

Based on different activities, it will control what major job steps are requested to run.  Below is an option to be shared.
  1. Define a SSIS variable Control Steps in master package, which is initialized based on different activity  when master package starts, such as: POWER(2, 1)  + POWER(2, 5)  +  POWER(2, 7)   for main step 1,5 and 7
  1. Within individual package, define three variables:
    1. CtrlSteps, which is passed from master package.
    2. WhatStep, variable in child package as calculated such as (DT_I4) (POWER(2, 5)  )
    3. Runstep, to decide whether this steps is requested to run
  2. Add condition in the control flow based on predefined conditions



Lesson learned…


There have been two issues during the implementation: 1) how to handle the situation when ETL server and database server are separated, and 2) how to handle the situation when alert can’t work during ETL job is in process

Work with separated ETL server and database server


Solution 1:  SQL Job in ETL server can’t directly call stored procedure in database server.
The solution is to build database stored procedure into SSIS ETL, and then use SQL alert triggers a SQL job, where SQL job calls this ETL. The sequence is
SQL Alert (ETL server) --->SQL job 1 (ETL server)--->stored procedure (Database server)


Solution 2: Stored procedure in database server can’t directly call ETL package in ETL server
The solution is to create a separated stored procedure in MSDB in ETL server, then make a link server to Database server, and then the stored procedure in database server calls remotely the MSDB stored procedure ETL server, where MSDB stored procedure call a real ETL job in ETL server
The sequence is:
Stored procedure (Database server) ---> MSDB stored procedure (ETL server)  --->SQL job 2 (ETL server)

The whole process to trigger a ETL task is listed as follows
SQL Alert (ETL server) --->SQL job 1 (ETL server) --->stored procedure (Database server) ---> MSDB stored procedure (ETL server)  --->SQL job 2 (ETL server)


Use dynamic schedule to resolve SQL alert weakness


Problem:  SQL alert is used to trigger out ETL job in automatic data load. When sales files are moved into predefined data source folder, SQL WMI alert triggers ETL job to load these files.  However, files are sometimes in data source folder and cannot be processed. This problem presents a major challenge to the current architecture of automatic data load. It is request to start SQL job automatically when there are files dropped in source data folder during ETL process.



The solution is add last step in ETL job to start SQL job automatically when there are files dropped in source data folder during ETL process.

SQL Alert (ETL server) --->SQL job 1 (ETL server) --->stored procedure (Database server) ---> MSDB stored procedure (ETL server)  --->SQL job 2 (ETL server) --->  start job 1 (ETL server) when there are files dropped in source data folder during ETL process.

The implementation has following steps:
  1. Create one time schedule for SQL job 1
  2. Create separate ETL sub package and append it to ending of ETL process
Check whether there is any files in source data folder, if yes, run script to schedule the job.

1 comment:

  1. Hi Charles,
    This post gives a fair idea of how the stuff should work, I would like to go ahead and implement this.
    Can you share the report specification file mentioned in this blog.


    Thanks,
    Sujit

    ReplyDelete