Explore Unity Catalog with Mitchell as he dives into Databricks' centralized data governance solution. Learn about access control, auditing, data lineage, and features like tagging and AI-driven descriptions.
In this comprehensive Unity Catalog course, Mitchell provides an in-depth exploration of Databricks' centralized data governance solution, showcasing its powerful features for managing and securing data and AI assets across multiple workspaces. You’ll begin with an overview of Unity Catalog, learning about its hierarchical structure of catalogs, schemas, and tables, and its ability to centralize access control, auditing, and data lineage. Mitchell introduces key features such as lineage analysis, commenting, tagging, and AI-generated descriptions, which enhance data discoverability and governance.
Course Outline ( Free Preview)
Unity Catalog for Databricks - What You Need to Get Started
Module 01 - What is Unity Catalog
In this video, Mitchell introduces Unity Catalog within Databricks, emphasizing its role as a centralized data governance solution that manages and secures data and AI assets across multiple workspaces. He explains how Unity Catalog simplifies access control, auditing, and data lineage by allowing these tasks to be performed at a higher level, rather than individually within each workspace. Mitchell also highlights key features such as centralized security management, built-in auditing, and data discovery capabilities, which enhance the overall efficiency and security of data management in Databricks.
Module 02 - Previewing Unity Catalog
In this video, Mitchell provides an overview of the Unity Catalog and its key features within the Databricks environment. He explains the different types of catalogs, such as system, Hive, and foreign catalogs, and discusses the management of tables, including permissions, commenting, and tagging for better data discoverability. Additionally, Mitchell highlights the lineage analysis feature, which allows users to trace the origin and transformation of data through a lineage graph.
Module 03 - Unity Catalog Databricks20 min.
In this video, Mitchell guides students through the setup and configuration of the Unity Catalog in Databricks, emphasizing the prerequisites such as an Azure subscription and necessary permissions. He demonstrates the process of creating a Databricks environment, provisioning a Data Lake storage account, and configuring managed storage locations for data governance and isolation. The video concludes with steps to create a metastore, assign workspaces, and enable the Unity Catalog, preparing students for further exploration of catalogs and schemas.
Module 04 - Catalogs and Schemas29 min.
In this video, Mitchell explains the Unity Catalog in Databricks, focusing on the hierarchical structure of catalogs, schemas, and tables. He highlights the importance of data organization, isolation, and access control, demonstrating how to create and manage these elements within Databricks workspaces. Mitchell also provides practical examples, including setting up a catalog, schema, and loading data into a table, emphasizing the flexibility and governance capabilities of the Unity Catalog.
Module 05 - Comments and Tags15 min.
In this video, Mitchell explains how to add comments and tags to various securable objects within the Unity Catalog in Databricks. He highlights the importance of comments for annotating objects with metadata, which helps in understanding data relationships and content. Additionally, Mitchell demonstrates how to use tags for categorizing objects, simplifying search and discovery, and leveraging AI-generated descriptions to enhance efficiency.
Module 06 - External Locations19 min.
In this video, Mitchell demonstrates how to set up an external storage location in Databricks, specifically focusing on connecting to an Azure Data Lake. He explains the necessary steps, including creating an Azure Databricks access connector, setting up storage credentials, and configuring an external location with appropriate permissions. This process allows users to securely access and manage external data without needing to copy it into the Databricks environment.
Module 07 - Lakehouse Federation14 min.
In this video, Mitchell introduces Lake House Federation, a feature within Databricks' Unity Catalog that allows users to discover, query, and govern data across multiple platforms without copying the data. He explains the benefits of using this feature, such as unified data access, consistent governance, and enhanced collaboration, particularly highlighting its ability to leverage existing compute resources. Mitchell also provides a step-by-step demonstration of setting up a connection with Snowflake, showcasing how to integrate external data sources seamlessly into the Databricks environment.
Module 08 - Access Control30 min.
In this video, Mitchell explains the fundamentals of access control within Unity Catalog on Azure Databricks, emphasizing centralized management of permissions and data governance across workspaces. He highlights the hierarchical nature of permissions, allowing for efficient and consistent access control from catalog to schema to table levels. Mitchell also demonstrates practical steps for managing user access, including the creation of user groups and the application of permissions at various levels to streamline data management and governance.
Module 09 - Column Masking19 min.
In this video, Mitchell explains the concept of column masking within the Unity Catalog in Databricks, highlighting its importance in protecting sensitive information by hiding column values based on user roles. He demonstrates how to create and apply masking functions using SQL and the Catalog Explorer UI, ensuring that only authorized users can view sensitive data. This feature is crucial for maintaining data security and compliance, especially in environments where different user groups require varying levels of data access.
Mitchell Pearson has been with Pragmatic Works for 11 years as a Data Platform Consultant, Trainer and Team Lead. Mitchell has authored books on SQL Server, Power BI and the Power Platform. Data Platform experience includes designing and implementing enterprise level Business Intelligence solutions with the Microsoft SQL Server stack (T-SQL, SSIS, SSAS, SSRS), the Power Platform, Microsoft Azure and Fabric.