In SSIS, we recently were engaged by a customer who had a fairly simple request. They wished to remove non-numeric items from a column in the data flow. They were doing this in a stored procedure by using a T-SQL replace function row-by-row and we wanted to make this a data flow instead due to speed. While we could have done this in a series of Derived Column transforms, we wanted to disallow any non-numeric field and hard coding a list of disallowed characters would have been quite taxing.
To do this in RegEx in the data flow, you can download the RegEx transform here: http://www.microsoft.com/downloads/details.aspx?FamilyID=c16f11ad-150a-4091-b3a2-83d21d3e0973&displaylang=en or use a Script Transform (Script Component). First drag the Script Component over and select Transform as the role you wish it to play. Next, select the columns that you wish to pass into the script and select ReadWrite. In the Script Tab, click Design Script.
In the Imports area, add the following line of code:
Imports System.Text.RegularExpressions
Next, replace the Input0_ProcessInputRow subroutine with the following code:
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
If Row.ProductNumber_IsNull = False Or Row.ProductNumber = "" Then
Dim pattern As String = String.Empty
Dim r As Regex = Nothing
pattern = "[^0-9]"
r = New Regex(pattern, RegexOptions.Compiled)
Row.ProductNumber = Regex.Replace(Row.ProductNumber, pattern, "")
End If
End Sub
You'll see in the above that the pattern is currently set to [^0-9]. The ^ symbol means that these are the only acceptable values going in. If you only want alphanumeric characters, you could set this to [^a-zA-Z0-9]. If you remove the ^, it means these are the only characters that are unacceptable. Row.ProductNumber shows you the column that you're setting in the Script Transform.
As you can see in this simple example, with roughly a dozen lines of code, you can solve a fairly common business scenario. Here's the final code that is fully integrated.
-- Brian Knight