Saturday, March 29, 2014

How to build summary tables to improve performance at database level

When designing data warehouse, we need to create the fact table with lowest grain. Sometimes, these tables can be very big, which will impair the performance.  In order to get good performance, we can build summary tables by applying four ideas below
  1. Exclude one or more dimensions

The first approach is to exclude one or more dimensions when summarizing a fact table. This is the easiest type of aggregate to create because the dimensional data doesn’t need to be accessed by whatever reasons. The summary table can provide significant performance advantages over the detail fact tables, as number of records can be dramatically reduced.
  1. Use higher level of dimensions

The second approach is to have one or more dimensions replaced by higher grain. For instance, we can use this approach to create a monthly level summary table of the original daily or even lower fact table. We’re aggregating all of the daily records into monthly records and reducing the size of the fact table, but we’re still able to do dimensional analysis by time at the month, quarter, and year levels. In this case, we don’t need to change dimension itself.  There are two ways to connect this summary table with time dimension table. The one way is to define determinant, then create relationship between month from summary table and month in dimension table.  The other way is make the date key in summary table as end (or beginning) of month, then still connect time dimension via date key.
  1. Use high-level dimension attributes

The third approach simply replaces dimensional keys with high-level dimensional attributes. In other word, the data is very much same as report data without any joins. This approach provide the best performance but without any flexibility at all.
  1. Save cumulative measures as columns

The fourth approach is to target cumulative measures, which are used very much in report development. Usually, many different ways are applied to create relative time on the fly. However, if the data amount is big, and the calculation logic is complicated, then we can save all cumulative measures into a summary table. In this way, framework manager and report will be easily to build. Because there is no calculation involved, the performance must be very good.

No comments:

Post a Comment