Pragmatic Works Nerd News

Overview of Azure Data Catalog

Written by Chris Seferlis | May 24, 2018

In today’s post, I’ll give you an overview of Azure Data Catalog and an example of how you may use it in your organization. Azure Data Catalog is used to discover Azure data sources in your environment, as well as tell what those data sources are and describe the data sources that you’ve already found.

It provides the ability to add metadata and annotations around all Azure data. So, if you want to describe a column, a data source, or apply documentation or a schema, you can do all this in the Azure Data Catalog. It also provides a cloud-based service in which a data source can be registered.

The data remains in the existing location, but a copy of it’s metadata is added to the data catalog, as well as a reference to the data source location, so you’ll know where to find it when you need it. The metadata is also indexed, ensuring that each data source is easily discoverable through a search, and that it’s understandable to the users who discover it.

The primary purpose of registering data sources in the data catalog is the discovery and understanding of them. Enterprise users may need data for Business Intelligence, application development, data science or other tests where the right data is required. The Data Catalog discovery experience can be used to quickly find data that matches their needs.

Users can also understand the data to evaluate if it serves their purpose. The data is consumed by opening the data source in their tool of choice. At the same time, users can contribute to the catalog and the metadata or add annotations. They can register new data sources as well, which can be discovered by other users and understood and consumed by other users who have permission to do so. This is locked down by permission and can be secured with Active Directory.

Here’s a basic example of using Azure Data Catalog:

Let’s say we’re moving towards a self-service BI idea, whether it’s a data team or IT team setting up the data, so users can create their own dashboards in Power BI. The IT or data team has already secured the data by making sure users only have access to what they need/should have. Now the information workers and analysts can create their own reports, workbooks and dashboards without having any restrictions from IT.

As the new data gets created by workers and analysts, it can be challenging to provide information about the data, where it is for instance. Let’s say I save it into a SharePoint repository. I may not remember to tell everyone about it and even if I did, I’ll probably have to remind them 6 months from now. Obviously, this is ineffective and a big waste of time.

This is where Data Catalog comes in, as it gives the ability for the data creators to catalog and tag data, making it easier for all the users with permissions to find it. It can be registered in a centralized data catalog, it leaves the data where it came from, and users can go in and add annotations or tags or metadata that applies.

Azure Data Catalog is a great tool that we highly recommend for all your data projects in Azure. If you have questions about Azure Data Catalog in general or how to best utilize it in your organization, we’re here to help. Whether you want information on this or any Azure product, simply click the link below—we are your best resource.