Table A is transaction table, table B is a history table. Table B only needs to be populated with new rows from table A. Table B contains a copy of the business key in table A. This wil be done by an SSIS package.
Since no updates are necessary a slowly changing dimension transformation is will not be used.
As a first solutoin I created an SSIs package with a Dataflow, inside the Dataflow I have 3 steps: an OLE DB source to read an inventory, a data transformation and an OLE DB Destination. My OLE DB select statement would read table A and join to table B to find select only rows from A that are not yet written in B.
This is the select statement in table A:
SELECT InventId, col1, col2, ...
FROM dbo.Inventory AS a
LEFT OUTER JOINdbo.HistInvent AS b
ON a.InventId = b.A_InventId
WHERE b.HistGUID IS NULL
The 3 steps work together fine as long as there are not too many rows, a couple of 1000 rows gave no problems to debug the package.
But when there are more than 100.000 rows then the package hangs. My guess is that the data is read in chunks and written in chunks, but the writing in step 3 blocks table B from being read for the next chunk in step 1.
Only read from Table A in step 1, then use a Lookup Transformation as a new step between the 1st and 2nd step. The lookup transformation reads the business key in Table B and compares it to the Primary key of table A.
The lookup only redirects Rows that are not found in B.
The Dataflow in 4 steps:
This time the package runs fine for more than 500.000 rows.
although it seemed efficient to let SQL server limit the data flow in step 1, it is advisable to use the SSIS tools even though they create more overhead.