Showing posts with label 5.1. Report Performance Tuning. Show all posts
Showing posts with label 5.1. Report Performance Tuning. Show all posts

Monday, April 1, 2019

How to model Cognos Framework Manager to generate most efficient SQL query

This document is intended to focus on detail design of Cognos framework manager, rather than the overall design. The ultimate goal is to generate most efficient SQL query and  therefore achieve the best performance. The performance is no big problem when physical table size is relatively small. However, the problem becomes unmanageable when the table size becomes big enough, about one billion records, even with data appliance like Netezza. Cognos framework design plays imperative role in performance. We have improved report performance from 7 hours to 2 minutes for one major financial company by optimizing the model of Cognos framework manager.

All these solutions are going to work based on two assumptions below
  1. Single physical database source. As long as all those involved tables can be joined together with single SQL query, we can considered it as single database source. It is obviously dependent on database kind, such as SQL server, oracle, DB2 and Netezza.
  2. No local process. Our focus is to generate single query for each data container.  If there are more than one queries generated, then local processing is involved.

The criteria of most efficient SQL are defined as below
  1. Minimized SQL: only tables and columns that are used in report are listed in SQL query. for example, given that there are 10 dimensions and 1 fact in the model, and the fact table has 100 columns. Report select only one column  from dimension and 1 column from fact, then the generated SQL should only have these two columns built into query. All other 9 dimensions should not be joined and used at all.
  2. No subquery, or WITH statement:  The date filter is most frequently used. We want this date filter to be applied at the lowest level. In other word, database will filter data first and then join. Put filter at different level in SQL statement could result in 100 times difference of performance for big database.
Sample
Below is a real example. Both FACT_ONE and FACT_TWO are about 600 million records, while all dimensions have about 1 million records. the generated query was changed from not efficient query to efficient query by remodelling Cognos Framework manager. 
Before:Not Efficient Query - with problems
      Query Result ->
        UNION ->
          SubQuery_ONE ->
            JOIN
              SubQuery_TWO  (Filter at this level)  
                FACT_ONE (All columns)
              SubQuery_THREE  (Filter at this level)
                SubQuery_FOUR
                  SubQuery_FIVE
                    SubQuery_SIX (Joined with 10 dimensions)
                      FACT_TWO (All columns)
          SubQuery_ONE ->
            JOIN ...
Problem 1 - has too many subqueries, which make filters NOT be applied at bottom level. Database needs to query 600 million records to join;
Problem 2 - violate minimized query concept, even 10 dimensions are not used, but all dimensions join with fact table.
Problem 3- violate minimized query concept, even only a few columns are used, but all columns, more than 200 columns in this case, are selected.
            
After: Efficient Query
      Query Result ->
        UNION ->
          SubQuery_ONE ->
            JOIN
              FACT_ONE  (needed columns, Filter at this level)
              FACT_TWO  (needed columns, Filter at this level)
          SubQuery_ONE ->
            JOIN ...

Some of solutions are listed below based on our consulting experience, which will be explained in detail in future blogs. Stay tuned.
  1. Problems with multiple data sources point to the same database: Even we have single physical database, but somehow local processing is involved. Behind the scene, Cognos considers it as two different database.
  2. At which layer should the relationship be created in Cognos Framework manager?  database layer
  3. At which layer should filter(s) be added in Cognos Framework manager? business layer, while relationships are defined at database layer
  4. At which layer should a query column be derived in Cognos Framework manager? business layer, while relationships are defined at database layer
  5. What is the length of column limit in Cognos Framework Manager that generate subquery? 31

Saturday, September 6, 2014

How to improve crosstab performance by removing default measure

  1. Introduction

To insert measures on nested crosstab edge give business users to see more than one measure at the same time. It is necessary based on business requirements. However, this definition will cause Cognos engine to bring all records into Cognos server, and performance is too poor to acceptable. the problem is that we the default measure on crosstab corner is not removed.
  1. Test result

A sample is created with sample database and package from Cognos. The test result is listed as follows


Cases
Waiting time for first page to appear
  1. Normal with single default measure
4 seconds
  1. Measures on nested crosstab edge on row, (with default measure on crosstab corner)
65 seconds (14 times slower)
  1. Measures on nested crosstab edge on column (with default measure on crosstab corner)
33 second  (7 times slower)
  1. Measures on nested crosstab edge on row, (without default measure on crosstab corner)
4 seconds
  1. Measures on nested crosstab edge on column (without default measure on crosstab corner)
4 seconds


The detail sample will described below for you to reviwew


  1. Sample

    1. Database

SQL server sample database from Cognos GOSALES  (you can use oracle or DB2 database and should produce the same problem)


    1. FM

Go sales (query)


    1. Reports

Case 1 Normal with single default measure


Case 2 Measures on nested crosstab edge on row, (with default measure on crosstab corner)


Case 3 Measures on nested crosstab edge on column (with default measure on crosstab corner)




Case 4 Measures on nested crosstab edge on row, (without default measure on crosstab corner)


Case 5 Measures on nested crosstab edge on column (without default measure on crosstab corner)



Saturday, May 10, 2014

How to apply minimized SQL concept

Minimized SQL concept is one of major design consideration in Cognos framework manager. In case when a model is complicated, it will have very big impact on performance, including DMR model and relational model. This chapter will provide a sample to demonstrate that all relationship should not be defined at database layer, and report will only retrieve involved table when absolutely necessary.

      1. Sample

Given the data model below, we could consider G1 and G2 as Multi-valued dimensions, ID from Fact self as fact dimension. Please note that G1 and G2 Columns are not real primary keys. G1 and G2 tables are bridge tables, meaning that each group has multiple D1 or D2 records.
      1. Model without minimized SQL

In order to make a clear STAR scheme at business layer, we can define table Fact, Fact_detail, G1 and G2 together to behave as a fact table. However, this approach will result in a poor performance, as these joins among these tables are always there.
        1. Framework manager layers

          1. Database layer
Make clear relationships for G1, G2, Fact and Fact_Detail
          1. Business layer
Create a model query Fact that is from 4 tables, G1, G2, Fact and Fact_Detail, where column D1 and column D2 is from dimension D1 and D2 respectively.  Then build a clear STAR scheme as below.
          1. Dimension layer
Dimension view is very much close to business layer with STAR scheme with exception of a fact dimension FD.
        1. Test result

          1. Using fact dimension
As matter of fact, we need only Fact and Fact_Detail. But we can see the generated SQL is to use G1, G2, Fact and Fact_Detail together.
          1. Using D3 dimension only
When using D3 dimension, we should only involve Fact_Detail table only, as only Fact_Detail contains D3 dimension. The generate SQL uses all tables G1, G2, Fact and Fact_Detail.
          1. Using D1, D2 and D3 dimensions
All tables should be involved in this case. But the sub-query could impair the performance.


      1. Model with minimized SQL

The minimized SQL concept is to bring only involved table(s), and therefore achieve the better performance.


        1. Framework manager layers

          1. Database layer
No any relationships are created at database layer.
          1. Business layer
All relationships in detail are created at business layer. Certainly, we need to follow Cognos cardinality to make sure that path is predictable. There are many best practises in this regard. Please note that the relationship may NOT follow exact Entity /Relationship Diagram.
          1. Dimension layer
It is same as old dimension layer. It is a good news to us as we don`t need to change report, and we can change framework manager to improve performance.
        1. Test result

          1. Using fact dimension


          1. Using D3 dimension only
          2. Using D1, D2 and D3 dimensions

Saturday, April 12, 2014

How to avoid local processing

Cognos engine is very powerful as it can handle all available functions. Some of these functions and structures could not be handled in database, such as Cognos can bring two different sets of data from two different database, one is from SQL server, the other is from SQL server. These queries must be processed locally on the Cognos server, which is called as local processing.



This local processing is not normally good for query performance for two main reasons:
    1. Performing operations at the Cognos server means that you no longer have database indexes or other fancy things like materialization to help speed up the query processing.
    2. In order to perform most operations with local processing you will have to retrieve a larger, more detailed, result set from the data source. Fetching large result sets will slow things down and processing of these large result sets at the Cognos server will impair performance.


There are two ways to identify local processing:
  1. Comparing Cognos SQL and native SQL
Retrieve both Cognos SQL and native SQL from report studio
Retrieve both Cognos SQL and native SQL from Framework manager
If the logic between Cognos SQL and native SQL is different, then local processing is involved. More importantly, we can figure out what data is retrieved from database.
  1. Change local limited to Database only
Change it from report studio
Change it from framework manage
r
Running report in report studio, or test query in framework manager, if we get error message as below, then local processing is involved.
<UDA-SQL-0219 the function <function name > is used for local processing but is not available as a built-in function, or at least one of its parameters is not supported>
      • Using not supported functions or operations

There are two major Cognos functions: AT and PREFILTER, which are not supported by any database functions. We should avoid it unless we have to use it.
      • Joining queries from separate databases

There is one article from best practice with detail explanation working with Multiple Relational Data Sources.


Two major cases are discussed below

Working with single instance of database vendor

In this case, we need to define content manager data source to point to instance, then define data source in framework manager to point to the same content manager data source. The data source name in framework manager can be different. The relationship chain is:  Data source in framework manager, note that scheme and category can be changed with the same connection to point to different sets of data.
Data source in Cognos connection, there are three levels of configurations here, and logic database connection under Cognos can have multiple physical connections as well, while each physical connection can have multiple sign-on. 

Working with multiple database instance or vendors

The main solution is to consider using ETL or VVM (Virtual view manager), in order to bring data together.
Another thought is to bring data at report level to mimic the stitched query concept in case when fact data is from two different areas. In other word, we can bring summarized data from each data sources, and then join it at Cognos server using local processing. The performance should be reasonable good, as the record of summarized data is very small.

Joining queries from same database with different logical data sources

From whatever reasons, there could be more than one database connections in Cognos. In other word, there are more than one content manager data sources that are point to the same database. This could end up a huge impact on performance, as BI server will consider each logic content manager data source as separated sources. Even two query items, or queries point to the same database, it will use local processing to get data separated into Cognos server, and then join them together.

Saturday, March 29, 2014

How to build summary tables to improve performance at database level

When designing data warehouse, we need to create the fact table with lowest grain. Sometimes, these tables can be very big, which will impair the performance.  In order to get good performance, we can build summary tables by applying four ideas below
  1. Exclude one or more dimensions

The first approach is to exclude one or more dimensions when summarizing a fact table. This is the easiest type of aggregate to create because the dimensional data doesn’t need to be accessed by whatever reasons. The summary table can provide significant performance advantages over the detail fact tables, as number of records can be dramatically reduced.
  1. Use higher level of dimensions

The second approach is to have one or more dimensions replaced by higher grain. For instance, we can use this approach to create a monthly level summary table of the original daily or even lower fact table. We’re aggregating all of the daily records into monthly records and reducing the size of the fact table, but we’re still able to do dimensional analysis by time at the month, quarter, and year levels. In this case, we don’t need to change dimension itself.  There are two ways to connect this summary table with time dimension table. The one way is to define determinant, then create relationship between month from summary table and month in dimension table.  The other way is make the date key in summary table as end (or beginning) of month, then still connect time dimension via date key.
  1. Use high-level dimension attributes

The third approach simply replaces dimensional keys with high-level dimensional attributes. In other word, the data is very much same as report data without any joins. This approach provide the best performance but without any flexibility at all.
  1. Save cumulative measures as columns

The fourth approach is to target cumulative measures, which are used very much in report development. Usually, many different ways are applied to create relative time on the fly. However, if the data amount is big, and the calculation logic is complicated, then we can save all cumulative measures into a summary table. In this way, framework manager and report will be easily to build. Because there is no calculation involved, the performance must be very good.

Saturday, March 8, 2014

How to improve report performance by Integrating prompt macros in framework manager query

    1. Problem

Sometimes, we can`t directly use fact table from database, instead, we use query as fact table.  The final SQL query for report is then generated with the sub query as follows:
Select
From
(Fact sub query),
Dimension 1,
Dimension 2, ...
Dimension N
As we can see, query will get data from sub query, and then filter data out from all dimensions. If the sub query has small dataset, then this query performance is acceptable. However, when there are millions records retrieved from sub query, and then such a query will result in a very poor performance.  The problem here is that we cannot break sub query, as business logic is implemented into sub query.


    1. Solution

The idea is to embed all parameters directly into query with prompt macro. This way provides an opportunity to optimize query and therefore result in a better performance.  The sample below uses Cognos sample database, where a simple query subject is created.


--(1) First part is query with business logic. RETURNED_ITEM, ORDER_DETAILS and ORDER_HEADER are joined together. The difference is that all dimensions TIME_DIMENSION, PRODUCT and ORDER_METHOD are built into query as well.
select
TIME_DIMENSION.DAY_KEY AS DAY_KEY,
PRODUCT.PRODUCT_NUMBER AS PRODUCT_NUMBER,
ORDER_METHOD.ORDER_METHOD_CODE AS Order_method_code,
RETURNED_ITEM.RETURN_QUANTITY AS Return_quantity


from GOSALES.RETURNED_ITEM RETURNED_ITEM,
GOSALES.ORDER_DETAILS ORDER_DETAILS,
GOSALES.ORDER_HEADER ORDER_HEADER,
GOSALES.TIME_DIMENSION TIME_DIMENSION
GOSALES.PRODUCT PRODUCT,
GOSALES.ORDER_METHOD ORDER_METHOD,


where ORDER_HEADER.ORDER_NUMBER = ORDER_DETAILS.ORDER_NUMBER
and ORDER_DETAILS.PRODUCT_NUMBER = PRODUCT.PRODUCT_NUMBER
and ORDER_HEADER.ORDER_METHOD_CODE = ORDER_METHOD.ORDER_METHOD_CODE
and RETURNED_ITEM.ORDER_DETAIL_CODE = ORDER_DETAILS.ORDER_DETAIL_CODE
and (datepart(year,RETURNED_ITEM.RETURN_DATE) * 10000 + datepart(month,RETURNED_ITEM.RETURN_DATE) * 100) + datepart(day,RETURNED_ITEM.RETURN_DATE) = TIME_DIMENSION.DAY_KEY
and ORDER_DETAILS.PRODUCT_NUMBER = PRODUCT.PRODUCT_NUMBER
and ORDER_HEADER.ORDER_METHOD_CODE = ORDER_METHOD.ORDER_METHOD_CODE


--(2) Second part is mandatory filter, in this case, 'p_BEGIN_DATE' and  'p_END_DATE' are applied


and TIME_DIMENSION.DAY_KEY #'between '+prompt('p_BEGIN_DATE','string')+' and '+ prompt('p_END_DATE','string')#


--(3)Third part is optional filter, in this case 'p_PRODUCT_NUMBER' and  'p_ORDER_METHOD_CODE'  are applied
AND # 'PRODUCT.PRODUCT_NUMBER=' +prompt('p_PRODUCT_NUMBER','string', 'PRODUCT.PRODUCT_NUMBER')#
AND # 'ORDER_METHOD.ORDER_METHOD_CODE=' +prompt('p_ORDER_METHOD_CODE','string', 'ORDER_METHOD.ORDER_METHOD_CODE')#


Assume
'p_BEGIN_DATE'  = 20100101
'p_END_DATE' = 20101231
'p_PRODUCT_NUMBER' =142110
'p_ORDER_METHOD_CODE' = 2
Then query will looks like follows
....................
and TIME_DIMENSION.DAY_KEY between 20100101 and 20101231
AND PRODUCT.PRODUCT_NUMBER=142110
AND ORDER_METHOD.ORDER_METHOD_CODE=2


Assume that only mandatory filters are given
'p_BEGIN_DATE'  = 20100101
'p_END_DATE' = 20101231
Then query will looks like follows
....................
and TIME_DIMENSION.DAY_KEY between 20100101 and 20101231
AND PRODUCT.PRODUCT_NUMBER= PRODUCT.PRODUCT_NUMBER

AND ORDER_METHOD.ORDER_METHOD_CODE= ORDER_METHOD.ORDER_METHOD_CODE