What Dynamic Data Masking Is
Dynamic data masking limits (DDM) sensitive data exposure by masking it to non-privileged users. It can be used to greatly simplify the design and coding of security in your application.The purpose of dynamic data masking is to limit exposure of sensitive data, preventing users who should not have access to the data from viewing it. Dynamic data masking does not aim to prevent database users from connecting directly to the database and running exhaustive queries that expose pieces of the sensitive data.
In the example below, assume there are only two roles of users. The first role is Sally in charge of fraud investigation. The second role is John who is in customer service. Sally requires access to all data to complete her job while John only needs a few elements and maybe the part of the customer’s email address to verify identity. Both users query SQL Server for the customer information, but Sally is able to see the entire email address while John can only see certain parts of the email address.
Benefits of Data Masking
What are the benefits of data masking? What are some specific reasons it should be used?
Data Masking Does |
Data Masking Does NOT |
- Limit sensitive data exposure by masking it to non-privileged users
- Minimally impact the application layer
- Allow applications to mask sensitive data without modifying existing queries
- Complement other SQL Server security features (auditing, encryption, row level security, etc.)
|
- Prevent database users from connecting directly to the database and running exhaustive queries that expose pieces of the sensitive data
|
How Data Masking Works
- On-the-fly obfuscation of data in query results
- Policy-driven on the table and column
- Multiple masking functions available for various sensitive data categories
- Flexibility to define a set of privileged logins for un-masked data access
- By default, database owner is unmasked
Data Masking Platform Options
Supported |
Platform |
Yes |
SQL Server 2016 |
Yes |
Azure SQL Database (SQL DB) |
No, but it will be supported in the future |
Azure SQL Data Warehouse (SQL DW) |
No, but it will be supported in the future |
Analytics Platform System (APS) |
SQL Server 2016 and Azure SQL DB are the only platforms currently supporting data masking. Azure SQL DW will support this feature in the near future while APS will be supported with AU6.
Data Masking Formats
The key concern I had looking at data masking was whether it would support all formats that I may need. The short answer is that I don’t feel that it does. I think they try to solve the flexibility issue with the partial function, but it isn’t very flexible. I have hopes this will be expanded with future updates. Below are the data masking formats currently available.
Enabling Dynamic Data Masking
Azure SQL Database
To enable data masking in Azure SQL DB, use the Azure Portal.
- Open the database blade
- Select the ‘Dynamic data masking’ option
- Enter a list of users that will see the data masking in the ‘SQL users excluded from masking’ text box
- Select the desired masking column and click ‘ADD MASK’
- Select the masking field format
- Click Update
- Click Save
SQL Server 2016
To interact with data masking for SQL Server 2016, use the following syntaxes:
— Enable data masking on dbo.DimCustomer.EmailAddress
ALTER TABLE dbo.DimCustomer ALTER COLUMN EmailAddress
ADD MASKED WITH (FUNCTION = ‘partial(3,”XXXXXX”,4)’);
— Grant DataMaskingDemo permission to see the data
GRANT UNMASK TO DataMaskingDemo;
— Revoke DataMaskingDemo permission to see the data
REVOKE UNMASK TO DataMaskingDemo;
— Drop the data masking
ALTER TABLE dbo.DimCustomer
ALTER COLUMN EmailAddress DROP MASKED;
Dynamic Data Masking Limitations and Considerations
So this is all fine and dandy and seems like it may work alright, but let’s get to the meat and see what limitations there are.
- Data Masking format disappears during a cast (or convert)
CAST([EmailAddress] AS CHAR(10))
Results in ‘XXXX’
In the example above, when converting a masked column to another string, all business value disappears and the column is completely masked.
- Predicate logic can infer values
WHERE [EmailAddress] = ‘jon24@adventure-works.com’
Returns the correct row with the masked value
In the example above, although the column has applied data masking, remember that the data still resides unmasked in the database. This means that predicate logic can be applied to infer masked values.
Another example would be John wanting to know Frank’s salary information. The salary column is masked, but this value can be derived using <, >, etc. Eventually John can pinpoint what Frank’s salary is.
The bottom line is that data masking is not the most secure option, so if data needs to be secure, there are other options such as Transparent Data Encryption and Row Level Security and Always Encrypted.
Speaking of which….
Always Encrypted
Always Encrypted is a data encryption technology that helps protect sensitive data at rest on the server, during movement between client and server, and while the data is in use, ensuring that sensitive data never appears as plaintext inside the database system.
After you encrypt data, only client applications or app servers that have access to the keys can access plaintext data.
The following illustration depicts the Always Encryption process.
- A user tries to execute a SQL statement from an application
- The SSN is encrypted in the database, so it is re-routed to the Enhanced ADO.NET Library
- The SSN the user specified is then encrypted and sent onto SQL Server
- SQL Server creates a SQL statement from the request, plus the SSN replaced with the cipher value
- The database then filters all rows that DO NOT match the cipher value
- The data is re-routed through the Enhanced ADO.NET Library so the cipher value can be decrypted
- The result set is returned to the user with the decrypted value
There are 2 encryption types that can be selected, deterministic and random, each of which has different attributes and the business needs will determine which one.
Deterministic Encryption
- Always generates the same encrypted value for any given plain text value
- Allows grouping, filtering by equality, and joining tables based on encrypted values
- Allows unauthorized users to guess information about encrypted values by examining patterns in the encrypted column
- Must use a column collation with a binary2 sort order for character columns
Random
- Encrypts data in a less predictable manner
- Randomized encryption is more secure, but prevents equality searches, grouping, indexing, and joining on encrypted columns
Encryption Keys
Once the encryption type is chosen, keys will need to be created. These will generally be created by a security and/or compliance officer. These keys are what will be used to encrypt and decrypt the data. The only encryption standard currently available is AEAD_AES_256_CBC_HMAC_SHA_256. Here are some details on this standard.
Column Master Keys
- Used to encrypt column encryption keys
- Must be stored in a trusted key store
- Information about column master keys, including their location, is stored in the database in system catalog views
Column Encryption Keys
- Used to encrypt sensitive data stored in database columns
- Can be encrypted using a single column encryption key
- Encrypted values of column encryption keys are stored in the database in system catalog views
- Store column encryption keys in a secure/trusted location for backup
Enabling Always Encrypted
Now that concepts are a little better understood, it’s time to actually implement our chosen security model.
1. Use SQL Server Management Studio 2016 (for both SQL Server 2016 and Azure SQL DB) |
|
2. Run the Always Encrypted wizard |
|
3. Choose the table column to encrypt |
|
4. Choose the Encryption Type
|
|
5. Create/Select Keys
- Column Master Key (CMK)
- Create a Column Encryption Key (CEK)
|
|
Finally, the easy part
6. Create an application that inserts, selects, and displays data from the encrypted columns
- After Always Encrypted is initialized, utilize an application that performs inserts and selects on the encrypted columns
- To successfully run the sample application, run it on the samecomputer where the Always Encrypted wizard is run
- To run the application on another computer, you must deploy Always Encrypted certificates to the computer running the client app
Important Notes:
-
The application must use SqlParameter objects when passing plaintext data to the server with Always Encrypted columns
-
Passing literal values without using SqlParameter objects will result in an exception
|
|
Always Encrypted Considerations
Create the appropriate certificates
http://www.databasejournal.com/features/mssql/exploration-of-sql-server-2016-always-encrypted-part-1-2.html
Limitations and Technologies NOT Supported
- Encrypted columns do not allow range-like operations such as ‘>, <‘ , ‘LIKE’, etc.
- Passing encrypted values to functions, user-defined or otherwise, is not allowed
(the database doesn’t have access to the unencrypted values)
- Equality comparisons can only be performed on columns that use deterministic encryption
- Indexes can only be applied to deterministic encryption columns
- Need same column encryption key for columns that are joined
- Constant expressions that refer to encrypted columns not allowed
ex. WHERE SSN = ‘111-11-1111’, but WHERE SSN = @SSN is because the driver works with the SqlParameter class
- Unsupported data types: xml, rowversion, image, ntext, text, sql_variant, hierarchyid, geography, geometry, and UDFs
- Currently the only driver that supports this feature is .NET 4.6
- This is NOT TDE
- Encrypted columns take significantly more space
- String-based columns that are encrypted with deterministic encryption must use a _BIN2 collation (e.g. Latin1_General_BIN2)
- The following data types are NOT supported as encrypted columns, per the documentation:
text/ntext/image
XML/hierarchyid/geography/geometry
alias types/user-defined data types
SQL_VARIANT
rowversion (timestamp)
- Sparse columnset (sparse columns are okay, provided the table doesn’t contain a columnset)
- Built-in alias types, e.g. SYSNAME
- Identity columns
- Computed columns
Temporal tables
- Triggers are partially supported
- Full-text search
- Replication (need more research)
- In-Memory OLTP
- Stretch Database
Whew, that’s a mouthful. It’s clear that this is a new technology and a first iteration. Polybase has been around for years, Query Store is a newer version of Extended Events, etc. So while some SQL Server 2016 technologies are relatively complete and stable, Always Encrypted still has some work to do. The limitations will weigh heavily in your architectural approach.
Selecting the Right Technology……or Both?
Is Always Encrypted superior to data masking? Should I should always choose this?
The short answer is that one provides a higher level of security over the other, BUT each business use case will dictate the technology chosen, which sometimes results in them being COMPLEMENTARY!
Combining deterministic encryption AND data masking allows for the highest FUNCTIONAL level of security for data warehouses and relational databases.
If both technologies are selected collectively, be sure to encrypt all columns FIRST, then add Dynamic Data Masking, otherwise:
Now that 2 major security features of SQL Server 2016 have been discuss, next on the agenda will be Row Level Security (RLS) and Transparent Data Encryption (TDE).