RSS 2.0
 Tuesday, May 05, 2009

The Slowly Changing Dimension (SCD) transformations are pretty handy in SSIS packages. They let you compare a new set of data to an existing table and insert or update as needed. Alternatively, you can have them insert new rows and mark old rows as "Expired" instead of updating rows when a change occurs. This is great if you want to maintain history. Now these are meant for dimensions in a data warehouse, but you can use them against any table in your database. The reason for this post is to look at an issue that can occur with SCDs when you have duplicate data in your incoming data set.

When an SCD analyzes a set of data it looks at each row and compares it to the destination. It decides whether the row will be inserted or updated (or inserted with the old row being marked inactive). Notice I said this comparison is with the incoming data and the destination. The SCD does check for duplicates that may exist in the incoming data. So if you have duplicates, that also happen to be new rows when compared to the destination, all the duplicate rows get "flagged" for insertion. You can probably guess what happens next. The first row gets inserted but the second and subsequent duplicates cause a Primary Key violation when they attempt to insert the now duplicate record.

Now this might seem like a glitch to some, but this functionality is "by design ". The button line is that the data being inserted should be de-duplicated prior to being inserted into the destination. In a perfect world, the data would come from a source where it lived in a normalized, and therefore de-duplicated, state. If you don't have the luxury of living in a perfect world, you can build a manual data de-dup process or use a Sort transformation. One of the options of the Sort transformation is to "Remove Rows with Duplicate Sort Values". Just drop one of these in, check the box, and you have magically de-dup your values.


Digg It
Tuesday, May 05, 2009 2:14:54 PM (Mountain Standard Time, UTC-07:00)  Eric Johnson  #    Comments [0] - Trackback
Integration Services | SQL Server
Comments are closed.
Search
Blog Directories
Computer Blogs - BlogCatalog Blog Directory
Blog Flux Directory
Categories
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Archive
<September 2010>
SunMonTueWedThuFriSat
2930311234
567891011
12131415161718
19202122232425
262728293012
3456789
Statistics
Total Posts: 157
This Year: 0
This Month: 0
This Week: 0
Comments: 38
All Content © 2010, Consortio Services, LLC