Problem
An SSIS package needs to import different versions of text
files into a single database table. The versions of the text files differ by
having some additional fields at the end of each line. The text files are comma
separated and have a header line that contains the versions number.
Simple solution
In the beginning I had only one version. Later a 2nd
version of the text file needed to be imported. To keep things simple and
develop something the worked, I defined a new flat file connection and created
a second dataflow that converted the columns from the text file into the proper
format for the database table. So I had 2 data flows that were almost the same
except for some additional fields in newer version of the file. Then came a 3rd
version of the text file with yet another field added and I decided not to copy
the data conversions and derived columns any longer, but to look at a solution
that handles the dataconversions and derivations for all corresponding columns
of the 3 versions into one dataflow. I had never used the Raw File source/destination
before but this looked like a good example of where to make use of it.
Raw File Solution
Connections
There are 3 different flat file connection managers.
Control Flow
The flat files are treated in a For Each Loop container.
The first component in the For Each Loop container is a Script
Task that reads the first line of the file and evaluates the version number.
The version number is stored in the first row of the file.
The 3 different files are read in 3 different data flows and
written to a Raw File Destination that contains all the columns of the latest
version. Then the RAW file is read processed
uniformly in a 4th Data Flow and output is sent to the database.
Data Flows
The Data Flow for version 3 of the text files has to read
all columns as they are and write to the Raw File Destination.
For version add a derived column component for the missing
columns, with NULL or default values, then write to the Raw File Destination.
For version 2 add a derived column component to add the
missing columns of version 3. Then write to the same Raw File Destination.
In the Control Flow the 3 separate Data Flows are brought
together in one 4th Data Flow. This one reads the Raw File Source, does
all necessary conversions, derives columns, lookups and more. The result is written
into the destination table.
To create the raw file
To use the raw file in the 4th Dataflow as Source, you will need to create it, by test running the package Data flow for version 3 with an existing version 3 file at hand. This step will create a raw file that you then use as Raw File Source in the design of the 4th data flow.Conclusion
The logic for the 3 different file versions is defined only once
in the 4th Data Flow.
No comments:
Post a Comment