Wednesday, May 1, 2019

How to Model Accumulating Snapshot - a practical sample

This document is intended to provide a practical solution to handle accumulative data with respect to data warehousing and business intelligence tools including Cognos and Tableau. The main idea is to have status change dates during the workflow, instead of providing status itself. This approach will enable us to report historical status at giving point of time.
Sample
For the sake of explanation, the following records is extremely simplified and used to demonstrate a new idea.  There are three month of data, including May, June and July. Two new applications occur each month. One application of each month  is approved and booked at the same month. Another application is approved next month and booked month after. For example, application ID1 is approved and booked in same month May. Application ID2 is approved in June, and booked in July.

 
Application ID
Application Month
Amount
Approved Month
Booked Month
ID1
May
10
May
May
ID2
May
20
June
July
ID3
June
11
June
June
ID4
June
21
July
 
ID5
July
11
July
July
ID6
July
21
 
 

 Current problem
Currently, the table is designed as below for data as of July

Application ID
Application Month
Amount
Approved
Booked
ID1
May
10
Yes
Yes
ID2
May
20
Yes
Yes
ID3
June
11
Yes
Yes
ID4
June
21
Yes
 
ID5
July
11
Yes
Yes
ID6
July
21
 
 

 
This data structure is well suitable for report for month of July. We can easily find the number of application approved and booked  as below

as of July
TTD
Approved
Booked
May
2
2
2
June
2
2
1
July
2
1
1

 
However, it is impossible to generate for month of June and May. The report will be generated as below if this table is used.

as of June
TTD
Approved
Booked
May
2
2
2
June
2
2
1

 
as of May
TTD
Approved
Booked
May
2
2
2

 

Proposed solution

The solution is to introduce month end snapshot to save data for each month, or apply the periodic snapshot concept.

 
Snapshot
Application ID
Application Month
Amount
Approved
Booked
May
ID1
May
10
Yes
Yes
May
ID2
May
20
 
 
June
ID1
May
10
Yes
Yes
June
ID2
May
20
Yes
 
June
ID3
June
11
Yes
Yes
June
ID4
June
21
 
 
July
ID1
May
10
Yes
Yes
July
ID2
May
20
Yes
Yes
July
ID3
June
11
Yes
Yes
July
ID4
June
21
Yes
 
July
ID5
July
11
Yes
Yes
July
ID6
July
21
 
 

 
This design will provide the functionality to report any month as needed.

as of July
TTD
Approved
Booked
May
2
2
2
June
2
2
1
July
2
1
1
as of June
TTD
Approved
Booked
May
2
2
1
June
2
1
1
as of May
TTD
Approved
Booked
May
2
1
1
 

This solution has some disadvantages:

1.     Need to change all reports to make sure snapshot is specified.

2.     Need to change model (Cognos and Tableau) to accommodate snapshot.

3.     Need to change ETL to save all snapshots

4.     The data volume will be dramatically  increased. 60 times of records are needed for 5 years

 

Better solution

To overcome all drawbacks from the proposed solution, we can add both approved date and book date into table as accumulating snapshot. However, approved and booked flags can be derived based on report month.

Application ID
Application Month
Amount
Approved Month
Booked Month
Approved
(Approved Month
<= Report Month)
Booked
(Booked Month
<= Report Month
ID1
May
10
May
May
 
 
ID2
May
20
June
July
 
 
ID3
June
11
June
June
 
 
ID4
June
21
July
 
 
 
ID5
July
11
July
July
 
 
ID6
July
21
 
 
 
 

 
If approved month exists, and Approved Month is equal to or earlier than  Report Month, then the approved flag is Yes. Similarly, If booked month exists, and Booked Month is equal to or earlier than  Report Month, then the booked flag is Yes.
 

as of July
Application ID
Application Date
Amount
Approved Month
Booked Month
Approved
(Approved Month
<= July)
Booked
(Booked Month
<= July)
ID1
May
10
May
May
Yes
Yes
ID2
May
20
June
July
Yes
Yes
ID3
June
11
June
June
Yes
Yes
ID4
June
21
July
 
Yes
 
ID5
July
11
July
July
Yes
Yes
ID6
July
21
 
 
 
 
as of July
TTD
Approved
Booked
May
2
2
2
June
2
2
1
July
2
1
1

 
as of June
Application ID
Application Date
Amount
Approved Month
Booked Month
Approved
(Approved Month
<= June)
Booked
(Booked Month
<= June)
ID1
May
10
May
May
Yes
Yes
ID2
May
20
June
July
Yes
 
ID3
June
11
June
June
Yes
Yes
ID4
June
21
July
 
 
 
ID5
July
11
July
July
 
 
ID6
July
21
 
 
 
 
as of Jun-18
TTD
Approved
Booked
May-18
2
2
1
Jun-18
2
1
1

 
as of May
Application ID
Application Date
Amount
Approved Month
Booked Month
Approved
(Approved Month
<= May)
Booked
(Booked Month
<= May)
ID1
May
10
May
May
Yes
Yes
ID2
May
20
June
July
 
 
ID3
June
11
June
June
 
 
ID4
June
21
July
 
 
 
ID5
July
11
July
July
 
 
ID6
July
21
 
 
 
 
as of May
TTD
Approved
Booked
May-18
2
1
1

 
 

This solution has some advantages against the proposed solution:

·         Do not need to change any report.

·         The data volume is same as current implemented.

 
The only effort needed are

·         Need to change ETL to add these date columns

·         Need to change the status flag to dynamic flag based on logic described above.

 
However, there are some limitations below after identified all current dynamic columns.

·         Confirm that all these dynamic columns can be derived from report month.

·         Confirm that the logic to derive these columns is not too complicated.

·         Confirm that the change of status back and forth can be ignored. There are two cases:

o    Change booked and approved back and forth, not sure whether it makes business sense.

o    Change approved and not approved back and forth.

 

 

No comments:

Post a Comment