All Datastage processing may be considered as being a form of Extract, Transform & Load (ETL), although it is possible to develop other processes with the tool.
Because Datastage jobs tend to be made up from a large number of sub-jobs, carefull planning beforehand is essential if the process is not to 'get away from' the implementors, with a confusion of roles and a profusion of competing and even conflictiong components.
It is normal Datastage practice to design a single master job, which manages any number of child jobs and, more importantly, provides a single interface for parameters and constants. This is the preferred mechanism for achieving portability by ensuring that differences between hosts are contained in a single location. Without a Job Sequence, every Datastage job in a batch would need to have its parameters updated when moved to a different host.
Unconnected jobs may then be run in parallel by the job sequence. Connected jobs must be overtly run serially and the job sequence must contain logic to confirm successful completion of a predecessor before starting its successor.
It's standard practice to seperate jobs into four notional types:
See Datastage Job Sequences for more detail.
This is the master job referred to above. It typically contains four sections:
There can, of course, be more than one sequence job in an application. In such a case, judgements will have to be made as to whether subsequent sequence jobs should be called from a single master or kicked off individually by the host operating system's scheduler.
Datastage processing has a tendency to create large numbers of job files. It is therefor a basic survival strategy to ensure that a logical naming convention is applied. One possible style is illustrated below.
DS45000MASTER Overall controller for European load
DS45010FRANCE French data reader
DS45020GERMANY German data reader
DS45030ITALY Italian data reader
DS45040SPAIN Spanish data reader
DS45110LOOKUPS Lookup file creator
DS45210ACCEPTED Split out accepted applications
DS45220DENIED Split out denied applications
DS45230WAITING Split out incomplate applications
DS45310ACCEPTED Load accepted applications
DS45320DENIED Load denied applications
DS45330WAITING Load incomplate applications
As with the jobs, so the components within a job should be named to follow a logical pattern that allows easy understanding of the relationships between them.
Let's consider a simple example: suppose that we want to pick up credit card applications, link them to product names, apply a score to each application and then drop the updated records into our warehouse staging area. Our job might look something like this…
--------------
| Product List |
--------------
|
|
|
v
--------------
| Product Hash |
--------------
|
|
|
v
-------------- ----------------- ----------------- ---------
| Applications |------>| Lookup Products |------>| Calculate Score |----->| Load DW |
-------------- ----------------- ----------------- ---------
We could use the names assigned in the above sketch, but this might lead to some confusion, if we need to build several similar loaders. Much better to assign names that relate the components to the job, whilst providing them with unique identifications:
-----------
| DS45310F2 |
-----------
|
|
|
v
-----------
| DS45310H1 |
-----------
|
|
|
v
----------- ----------- ----------- -----------
| DS45310F1 |------>| DS45310L1 |------>| DS45310T1 |----->| DS45310W1 |
----------- ----------- ----------- -----------
Then, in the documentation component, we can clarify the functionality in detail…
Job: DS45310ACCEPTED (Load accepted applications)
Component Description
----------- ---------------------------------------------------------------
DS45310F1 Amalgamated Applications file
DS45310F2 Product List reference file (from marketing database)
DS45310H1 Generated product list reference hash
DS45310L1 Lookup products and apply to DS_45310_F1
DS45310T1 Calculate Score transform. (Uses rule from spec. section 4.5.2)
DS45310W1 Load to OracleGlobal.DW.LiveApplications
----------- ---------------------------------------------------------------