In a recent webinar, I outlined a solution using SSIS, T-SQL, and Power BI to locate and analyze the PII information stored in a SQL Server 2016 environment. After the presentation, many attendees had some questions and you’ll find the answers in this blog.
Answer: The target SQL Server version referenced in the solution is SQL Server 2016. However, you could use the Connection Managers and connection strings to target prior versions of SQL Server to potentially capture exposure data for review and analysis.
Question: How did you enter comments in the SSIS package?
Answer: Using the Annotations feature of SSIS. See Using Annotations in Packages.
Question: Would date of birth be considered PII if you have no related name, social security number or other identifiable information?
Answer: In and of itself, date of birth will likely not directly lead to uniquely identifying an individual.
Question: What’s the overhead on running the process, let’s say, in an environment of 500 databases? Can it be run during a regular business day?
Answer: In the case of 500 databases, there could be overhead running during normal business hours. It is largely dependent upon how many PII columns are detected during the execution of the Exposure Data Audit SSIS package.
Question: Is Pragmatic Works planning to incorporate the ability to find PII information into the Workbench?
Answer: Currently, I am unaware of any PII-related features being released for Pragmatic Works Workbench.
Question: Is the PII information identified based on the column names? What can we identify from the data itself?
Answer: In the current configuration of the Exposure Data Audit solution, it identifies potential PII information based on the column names. Yes, you could potentially identify PII based on the data values themselves using T-SQL (Execute SQL Task) and C# (Script Task).
Question: Is there a central agency that governs the definition of what PII is or does the definition vary by industry?
Answer: As referenced in the presentation, the definition provided was from the U.S. Department of Commerce. With PII being a legal concept, regulatory laws such as Health Insurance Portability and Accountability Act (HIPAA) used to protect Protected Health Information (PHI) is simlar to PII. In the research performed, I did not come across a central agency or body that governs the definition of PII. Therefore, I would suggest that pertinent regulatory laws for specific industries should be reviewed to understand the definition and identification of PII-related information.
Question: Was the Risk Rating custom to your application or did you start with something generic? If so where did it come from?
Answer: The following Risk Rating Scale was used.
Question: Is encryption for Azure available in SQL Data Warehouse?
Answer: Yes. See Dynamic Data Masking, Row Level Security, Always Encrypted, Transparent Data Encryption
Question: How does row level security work with a select query comprising multiple joins?
Answer: RLS allows you to implement filtering logic of arbitrary complexity. To filter so that users can only see rows in their assigned specific criteria, a FILTER PREDICATE would need to be applied according the data restriction desired.
Question: For masking sensitive data, do we need to do this while creating the table/column?
Answer: Dynamic Data Masking (DDM) can be applied to an existing table. There are two ways which you can apply the DDM functions. You can apply this at the time you create the table or you can apply the function in the existing table that contains data using an ALTER statement.
Question: I am interested in masking the data on the fly via SQL and also want to keep track of what is the original value and the masked value(for traceability purposes).
Answer: Dynamic Data Masking does not modify or change the actual data stored in a table; it applies the masking functions on the table’s column at the time of returning a data as the result of a query.
Question: For PII data, is Dynamic Data Masking applicable and better than Always Encrypted?
Answer: Dynamic data masking is complementary to other SQL Server security features (auditing, encryption, row level security…) and it is highly recommended to use this feature in conjunction with them in addition in order to better protect the sensitive data in the database.
My webinar was one of Pragmatic Works' Free "Training on the T's" webinars, which happens every Tuesday at 11:00 EST. Visit our website to see upcoming topics and to register. You can also sign up for a free trial of our On-Demand Training platform with over 30 courses that cover a variety of topics including Business Intelligence, Business Analytics, Big Data, SQL Server Optimization and more.