Current architecture:
- Files are dropped into predefined data source folder
- SQL altert (SALESFileArrival) checks data source folder and triggers out SQL job (SALES_Job)
- SQL job (SALES_Job) runs a stored procedure (usp_RunSALESjob), where the stored procedure to run real ETL job (SALES_etl)
- ETL job (SALES_etl) processes all these files in data source folder
Files ==> SQL altert(SALESFileArrival) ==> SQL job (SALES_Job) ==> stored procedure(usp_RunSALESjob) ==> ETL job(SALES_etl)
Analysis: The problems occur when sales files dropped in data source folder what time ETL job(SALES_etl) is not finished
9:00AM | 9:01AM | 9:02AM | 9:10AM | 9:11AM | 9:12AM | …… | |
First set of files arrived at 9:00AM, SQL job started and will be finished at 9:10AM | |||||||
Second set of files arrived at 9:02AM, SQL job can’t be started, as it is not finished. | |||||||
When Alert checks files after 9:11AM and find NO EVENT, or NO FILES are moved into this folder within this minute, therefore, SQL job won’t start. See further note below |
If any new files dropped after 9:10AM, SQL job can start as well, because Alert can be triggered. However, if this is last wholesaler, these files could stay here for another week. Therefore we need to handle this case.
{Clarify two terms:
- 60 is interval, meaning that WMI check dropped file every 60 second for the past 60 seconds
SELECT * FROM __InstanceCreationEvent WITHIN 60 WHERE TargetInstance ISA "CIM_DirectoryContainsFile" and TargetInstance.GroupComponent= "Win32_Directory.Name=\"D:\\\\Newcom~1\\\\Data\\\\Sell_Thru\\\\SourceFileFolder\""
- Delay between responses is interval to response, in this case to run SQL job. We can either put 0 or 60 second. The best way is to set the same value as event interval
Solution:
The solution is to start SQL job automatically when there are files dropped in source data folder during ETL process. But how to start this process is to leverage the schedule of this SQL job. The new architecture is
Files → SQL altert(SALESFileArrival)→ SQL job (SALES_Job) →stored procedure(usp_RunSALESjob)→ ETL job(SALES_etl) → SQL job (SALES_Job)
- Create one time schedule for SQL job
- Create sql statement below to dynamically setup schedule
declare @start_date int, @start_time int
set @start_date= convert(int,CONVERT(varchar,getdate(),112))
set @start_time= convert(int,replace(left( convert(varchar, dateadd(minute,+2,getdate()), 114), 8),':',''))
EXEC msdb.dbo.sp_update_schedule @name ='Run_Alert_Missed_Sales_Files',
@enabled=1,
@active_start_date=@start_date,
@active_start_time=@start_time
- Create separate ETL sub package and append it to ending of ETL process
Check whether there is any files in source data folder, if yes, run script above
No comments:
Post a Comment