Pragmatic Works

Enabling your business intelligence enterprise.
Welcome to Pragmatic Works Sign in | Join | Help
in Search

Brian Knight

Top 3 SSIS Dataflow Mistakes

There's an old saying that when you have a hammer, everything looks like a nail. If the SSIS data flow is your hammer, too many people thing treat the components in the data flow like nails and don't follow best traditional ETL practices. This quick posts is a preview of an upcoming article coming out soon. Here are the top 3 gotchas I find customers do in the data flow that can drastically hurt performance.

1. Don't be afraid to use T-SQL

If your source is an OLE DB Source, you will see much better performance by performing common functions in the T-SQL statement. For example, rather than use a Data Conversion transform, use a CONVERT or CAST statement in the T-SQL statement. Instead of a Conditional Split, sometimes a simple WHERE clause can fulfill your requirement. The moral of this story is just because you have a transform to do the job, doesn't mean you should ditch T-SQL.

2. Avoid unnecessary asynchronous transforms

Asynchronous transformations like the Sort and Aggregate transforms are two of the most expensive transforms you can use in SSIS. Often times people perform a sort operation where there's no business requirement to have a sort. A common workaround if the data must be sorted is to perform an ORDER BY statement in the SQL Statement for the OLE DB Source. Often times, people sort because they must use a Merge Join or Merge transform. If that's the case and you have an OLE DB Source, sort in the SQL Statement and then use the Advanced Editor to state that the data is already sorted by switched the IsSorted flag to True and set the SortKey on the column. More on this in a future blog post. If you have flat files and a Merge Transform, there's no other way around using a Sort transform.

3. Don't be afraid to stage data

If you have to do a huge amount of updates, avoid the OLE DB Command transform. Using this transform if you have 10,000 rows going through it would cause 10,000 update statements and a drag on your server. Instead of doing this, land the data into a staging table and in the control flow, use an Execute SQL task to perform a set based update or delete, inner joining into the staging table.

 I'll have more on all three of these in a future article on SQLServerCentral.com but I wanted to give you a preview now!

Comments

 

Vinny Carpenter’s blog » Daily del.icio.us for January 7th said:

Pingback from  Vinny Carpenter’s blog » Daily del.icio.us for January 7th

January 7, 2008 5:26 PM
 

Vinny Carpenter’s blog » Daily del.icio.us for January 7th said:

Pingback from  Vinny Carpenter’s blog » Daily del.icio.us for January 7th

January 7, 2008 5:26 PM
 

mcallism375 said:

Hello

I have 2 ForEach Loop containers. The first checks files for zero length, the second looks at an archive folder to check that none of the files has already been archived.

The fist loop runs and the second falls over with the error: "The process cannot access the file because it is being used by another process"

Any ideas? Many thanks...

January 28, 2008 9:27 AM
 

Brian Knight said:

Could you send me some more information about your problem via email. I have some lengthy potential fixes for you.

February 5, 2008 9:28 PM
Powered by Community Server (Non-Commercial Edition), by Telligent Systems