Tuesday, June 12, 2012

How to use create dynamic schedule for a ETL job

Problem:  SQL alert is used to trigger out ETL job in automatic data load. When sales files are moved into predefined data source folder, SQL WMI alert triggers ETL job to load these files.  However, files are sometimes in data source folder and cannot be processed. This problem presents a major challenge to the current architecture of automatic data load.

Current architecture:  

  1. Files are  dropped into predefined data source folder
  2. SQL altert (SALESFileArrival)  checks data source folder and triggers out SQL job (SALES_Job)
  3. SQL job (SALES_Job)  runs a stored procedure (usp_RunSALESjob), where the stored procedure to run real ETL job (SALES_etl)
  4. ETL job (SALES_etl)  processes all these files in data source folder

Files ==> SQL altert(SALESFileArrival) ==> SQL job (SALES_Job)  ==> stored procedure(usp_RunSALESjob) ==> ETL job(SALES_etl)

Analysis:  The problems occur when sales files dropped in data source folder what time ETL job(SALES_etl) is not finished

9:00AM9:01AM9:02AM9:10AM9:11AM9:12AM……
First set of files arrived at 9:00AM, SQL job started and will be finished at 9:10AM
Second set of files arrived at 9:02AM, SQL job can’t be started, as it is not finished.
When Alert checks files after 9:11AM and find NO EVENT, or NO FILES are moved into this folder within this minute, therefore, SQL job won’t start.  See further note below


If any new files dropped after 9:10AM, SQL job can start as well, because Alert can be triggered.  However, if this is last wholesaler, these files could stay here for another week. Therefore we need to handle this case.

{Clarify two terms:

  1. 60 is interval, meaning that WMI check dropped file every 60 second for the past 60 seconds
SELECT * FROM __InstanceCreationEvent WITHIN 60 WHERE TargetInstance ISA "CIM_DirectoryContainsFile" and TargetInstance.GroupComponent= "Win32_Directory.Name=\"D:\\\\Newcom~1\\\\Data\\\\Sell_Thru\\\\SourceFileFolder\""

  1. Delay between responses is interval to response, in this case to run SQL job. We can either put 0 or 60 second. The best way is to set the same value as event interval
}

Solution:

The solution is to start SQL job automatically when there are files dropped in source data folder during ETL process. But how to start this process is to leverage the schedule of this SQL job. The new architecture is

Files SQL altert(SALESFileArrival) SQL job (SALES_Job)  stored procedure(usp_RunSALESjob) ETL job(SALES_etl)   SQL job (SALES_Job)  

  1. Create one time schedule for SQL job



  1. Create sql statement below to dynamically setup schedule
declare @start_date int, @start_time int
set @start_date= convert(int,CONVERT(varchar,getdate(),112))
set @start_time= convert(int,replace(left( convert(varchar, dateadd(minute,+2,getdate()), 114), 8),':',''))
EXEC msdb.dbo.sp_update_schedule @name ='Run_Alert_Missed_Sales_Files',
   @enabled=1,
   @active_start_date=@start_date,
@active_start_time=@start_time

  1. Create separate ETL sub package and append it to ending of ETL process
Check whether there is any files in source data folder, if yes, run script above

No comments:

Post a Comment