As a BI Architect/Developer, it may become necessary to conduct an Exposure Data Audit to uncover Personally Identifiable Information (PII)-related data within your SQL Server 2016 environment. The objective is to uncover any potential data security exposure risks in a continued effort to consider and implement enterprise grade security policies.
Microsoft SQL Server 2016 introduces new features, improved performance and a continued effort on providing options to implement enterprise grade security.
The purpose of this post is to outline a solution using SSIS, T-SQL, and Power BI to locate and analyze the PII information stored in a SQL Server 2016 environment.
The U.S Department of Commerce defines Personally Identifiable Information (PII) as any information about an individual that can be used to distinguish or trace an individual’s identity, such as name, social security number, date and place of birth, mother’s maiden name, or bio-metric records; and any other information that is linked or link-able to an individual, such as medical, educational, financial, and employment information. Annually, data leaks expose information on millions of people, potentially raising the risk of identity theft for those whose information is exposed.
Using combinations of Database, T-SQL, SSIS, and Power BI were used to locate and analyze the PII information stored in a SQL Server 2016 environment.
The Exposure Data Analysis Solution contains a database project. The database project was included in the solution so that the scripts used to create the database objects, such as tables, views, etc. were self-contained in the solution. Since most of us are familiar with using Visual Studio/SSDT for creating BI projects for SSIS (Integration Services), SSAS (Analysis Services), and SSRS (Reporting Services) this seemed to be a good idea.
The contents in the ExposureDataAnalsisDB project are the ‘Create Table’ statements, ‘Create Schema’ statements, ‘Create Index’ and ‘Insert Data’ statements preparing the database used for the solution. This is based upon “declarative database development” which focuses on the final state desired for an object.
For example, here is the script used to create the [Audit].[ExposureDataDiscovery] table.
There are many benefits to including a database project in SQL Server Data Tools (SSDT):
T-SQL and SSIS were used to crawl/inspect the SYSTEM CATALOG to retrieve metadata about the objects within each database located in the SQL Server 2016 environment; specifically, tables, views, and columns in the case of the Exposure Data Analysis solution. The Exposure Data Analysis ETL project analyzes column naming standards and metadata.
A connection manager can be created at the package level or at the project level. The connection manager created at the project level is available to all the packages in the project. Whereas, a connection manager created at the package level is available to that specific package.
Integration Services (SSIS) parameters allow you to assign values to properties within packages at the time of package execution. You can create project parameters at the project level and package parameters at the package level. Project parameters are used to supply any external input the project receives to one or more packages in the project. Package parameters allow you to modify package execution without having to edit and redeploy the package.
Project Parameters
In the SSIS Package – Exposure Data Discovery, two project level connections are used. Each connection manager utilizes properties to parameterize the ConnectionString property. This allows an expression referencing the parameter to be created for the ConnectionString property. Two project level parameters were created in anticipation of a project deployment scenario in which this solution is deployed across mutliple environments. Potentially taking advantage of SSIS Environment Variables resolving the connections through the specified environment reference and applying it to the parameter.
The first step in the Exposure Data Discovery SSIS package is to use a EXECUTE SQL TASK to truncate the table used to store the exposure data preparing it for data collection.
Execute SQL Task – Truncate Exposure Data Audit table.
The Execute SQL Task Editor is configured using a simple TRUNCATE TABLE statement for the table ([Audit].[ExposuredataDiscovery]) preparing it for data collection.
The connection manager (AuditStage OLEDB) is parameterized using a project level parameter.
The next step in the SSIS package is to obtain the Exposure Data Discovery Databases. This step utilizes an Execute SQL Task to acquire all of the databases on a given instance of SQL Server with the exception of the master, tempdb, model, and msdb databases. It utilizes a query against the sys.databases to capture “one row per database in the instance of SQL Server”.
The result set is stored to package variable of type object (objExposureDBName).
The DFT – Load Exposure Data Objects uses a source query using a SQL command from variable data
There are three subsequent Execute SQL Tasks that perform:
Last in the process updates to the risk rating values occur. Risk is the probability that a hazard will result in
The primary objective of the Exposure Data Audit review was to differentiate between the high-risk and low-risk data elements in relation to Personally Identifiable Information (PII).The Exposure Data Audit Power BI solution is used to analyze, visualize, and inspect the data.
There are 4 pages (reports) included in the Power BI Desktop solution:
Based on the latest Exposure Data Audit run, you can review the data using the visualizations and the sample data values capture and potentially prioritize the risks – or – possibly adjust the logic to refine the categorization of the risks, therefore, interactively permitting how to respond to the risks.
With the release of SQL Server 2016 introduces new features, improved performance and a continued effort on enterprise grade security.
This Reference Architecture is designed to provide data protection using Transparent Data Encryption, Always Encrypted, Row Level Security, and Dynamic Data Masking.
With the introduction of these features in 2016 we can:
Feel free to access the full recording of my webinar on this topic and download the solution.