Pragmatic Works

Enabling your business intelligence enterprise.
Welcome to Pragmatic Works Sign in | Join | Help
in Search

Brian Knight

Using RegEx in SSIS to Remove Given Characters

In SSIS, we recently were engaged by a customer who had a fairly simple request. They wished to remove non-numeric items from a column in the data flow. They were doing this in a stored procedure by using a T-SQL replace function row-by-row and we wanted to make this a data flow instead due to speed. While we could have done this in a series of Derived Column transforms, we wanted to disallow any non-numeric field and hard coding a list of disallowed characters would have been quite taxing.

To do this in RegEx in the data flow, you can download the RegEx transform here: http://www.microsoft.com/downloads/details.aspx?FamilyID=c16f11ad-150a-4091-b3a2-83d21d3e0973&displaylang=en or use a Script Transform (Script Component). First drag the Script Component over and select Transform as the role you wish it to play. Next, select the columns that you wish to pass into the script and select ReadWrite. In the Script Tab, click Design Script.

In the Imports area, add the following line of code:

Imports System.Text.RegularExpressions

Next, replace the Input0_ProcessInputRow subroutine with the following code:

 

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)

    If Row.ProductNumber_IsNull = False Or Row.ProductNumber = "" Then

        Dim pattern As String = String.Empty

        Dim r As Regex = Nothing

        pattern = "[^0-9]"
        r = New Regex(pattern, RegexOptions.Compiled)

        Row.ProductNumber = Regex.Replace(Row.ProductNumber, pattern, "")
    End If
End Sub

 

You'll see in the above that the pattern is currently set to [^0-9]. The ^ symbol means that these are the only acceptable values going in. If you only want alphanumeric characters, you could set this to [^a-zA-Z0-9]. If you remove the ^, it means these are the only characters that are unacceptable. Row.ProductNumber shows you the column that you're setting in the Script Transform.

As you can see in this simple example, with roughly a dozen lines of code, you can solve a fairly common business scenario. Here's the final code that is fully integrated.

image

 

-- Brian Knight

Comments

No Comments
Powered by Community Server (Non-Commercial Edition), by Telligent Systems