Thursday, June 28, 2012

How to handle slowly changing dimension type 2 (SCD 2) – a practical approach for big data

As known, the built in SCD dimension tool in SSIS can’t be used for big data. This document introduces a practical approach to resolve this issue. The main advantage is easy to debug. The solution is to introduce two temp tables to save records for SCD1 and SCD2, then use insert and update to the real dimension table.

Step 1:
truncate table DimContact_SCD1
truncate table DimContact_SCD2
btw. DimContact_SCD1 and DimContact_SCD2 have the same structure as DimContact

Step 2:



Step 3:
--SCD1
Update  T SET
Address1=S.Address1
,Address2=S.Address2
,City=S.City
,Province=S.Province
,Country=S.Country
,PostalCode=S.PostalCode
FROM DimContact T inner join DimContact_SCD1 S on (T.ContactAlternateKey = S.ContactAlternateKey)
WHERE T.ToDate is NULL

--SCD2
Update  T SET
ToDate =  convert ( date,?)
FROM DimContact T inner join DimContact_SCD2 S on (T.ContactAlternateKey = S.ContactAlternateKey)
WHERE T.ToDate is NULL

Insert into DimContact (
ContactAlternateKey
,MemberNumber
,IsMember
,FirstName
,LastName
,Address1
,Address2
,City
,Province
,Country
,PostalCode
,Gender
,DriversLicenseType
...
)
SELECT
ContactAlternateKey
,MemberNumber
,IsMember
,FirstName
,LastName
,Address1
,Address2
,City
,Province
,Country
,PostalCode
,Gender
...
FROM DimContact_SCD2

No comments:

Post a Comment