Monday, December 10, 2012

Re-use code in SSIS by using RAW file source and destination


Problem

An SSIS package needs to import different versions of text files into a single database table. The versions of the text files differ by having some additional fields at the end of each line. The text files are comma separated and have a header line that contains the versions number.


Simple solution

In the beginning I had only one version. Later a 2nd version of the text file needed to be imported. To keep things simple and develop something the worked, I defined a new flat file connection and created a second dataflow that converted the columns from the text file into the proper format for the database table. So I had 2 data flows that were almost the same except for some additional fields in newer version of the file. Then came a 3rd version of the text file with yet another field added and I decided not to copy the data conversions and derived columns any longer, but to look at a solution that handles the dataconversions and derivations for all corresponding columns of the 3 versions into one dataflow. I had never used the Raw File source/destination before but this looked like a good example of where to make use of it.


Raw File Solution

Connections

There are 3 different flat file connection managers.


Control Flow

The flat files are treated in a For Each Loop container.
The first component in the For Each Loop container is a Script Task that reads the first line of the file and evaluates the version number. The version number is stored in the first row of the file.
The 3 different files are read in 3 different data flows and written to a Raw File Destination that contains all the columns of the latest version.  Then the RAW file is read processed uniformly in a 4th Data Flow and output is sent to the database.

Data Flows

The Data Flow for version 3 of the text files has to read all columns as they are and write to the Raw File Destination.

For version add a derived column component for the missing columns, with NULL or default values, then write to the Raw File Destination.

For version 2 add a derived column component to add the missing columns of version 3. Then write to the same Raw File Destination.
In the Control Flow the 3 separate Data Flows are brought together in one 4th Data Flow. This one reads the Raw File Source, does all necessary conversions, derives columns, lookups and more. The result is written  into the destination table.


To create the raw file

To use the raw file in the 4th Dataflow as Source, you will need to create it, by test running the package Data flow for version 3 with an existing version 3 file at hand. This step will create a raw file that you then use as Raw File Source in the design of the 4th data flow.


Conclusion

The logic for the 3 different file versions is defined only once in the 4th Data Flow.

No comments: