Pragmatic Works

Enabling your business intelligence enterprise.
Welcome to Pragmatic Works Sign in | Join | Help
in Search

Brian Knight

December 2007 - Posts

  • Top 3 SSIS Dataflow Mistakes

    There's an old saying that when you have a hammer, everything looks like a nail. If the SSIS data flow is your hammer, too many people thing treat the components in the data flow like nails and don't follow best traditional ETL practices. This quick posts is a preview of an upcoming article coming out soon. Here are the top 3 gotchas I find customers do in the data flow that can drastically hurt performance.

    1. Don't be afraid to use T-SQL

    If your source is an OLE DB Source, you will see much better performance by performing common functions in the T-SQL statement. For example, rather than use a Data Conversion transform, use a CONVERT or CAST statement in the T-SQL statement. Instead of a Conditional Split, sometimes a simple WHERE clause can fulfill your requirement. The moral of this story is just because you have a transform to do the job, doesn't mean you should ditch T-SQL.

    2. Avoid unnecessary asynchronous transforms

    Asynchronous transformations like the Sort and Aggregate transforms are two of the most expensive transforms you can use in SSIS. Often times people perform a sort operation where there's no business requirement to have a sort. A common workaround if the data must be sorted is to perform an ORDER BY statement in the SQL Statement for the OLE DB Source. Often times, people sort because they must use a Merge Join or Merge transform. If that's the case and you have an OLE DB Source, sort in the SQL Statement and then use the Advanced Editor to state that the data is already sorted by switched the IsSorted flag to True and set the SortKey on the column. More on this in a future blog post. If you have flat files and a Merge Transform, there's no other way around using a Sort transform.

    3. Don't be afraid to stage data

    If you have to do a huge amount of updates, avoid the OLE DB Command transform. Using this transform if you have 10,000 rows going through it would cause 10,000 update statements and a drag on your server. Instead of doing this, land the data into a staging table and in the control flow, use an Execute SQL task to perform a set based update or delete, inner joining into the staging table.

     I'll have more on all three of these in a future article on SQLServerCentral.com but I wanted to give you a preview now!

More Posts
Powered by Community Server (Non-Commercial Edition), by Telligent Systems