Master Azure Databricks for data integration with our course that provides an exploration of connecting to various Databricks sources, made to give you the skills to use this powerful tool for data processing and insight sharing.
Dive into the world of data integration with our specialized class on Databricks Sources. This course is designed to demystify the process of connecting to diverse data sources using the robust, cloud-based platform of Azure Databricks. You'll gain practical knowledge on navigating the Azure portal to set up essential components such as resource groups, Databricks instances, key vaults, and data lakes. We'll walk you through the steps to access both public and private data repositories, ensuring you have the skills to leverage Databricks Sources effectively for any data project or challenge you face.
Course Outline ( Free Preview)
What You'll Need to Get Started for Databricks Sources
Module 00 - Class Introduction
This video is an introduction to the Azure Databricks Sources class, which teaches you how to connect to various data sources using Azure Data Bricks. Manuel explains and lists the prerequisites for the class, such as having access to the Azure portal, creating a resource group, a Databricks instance, a key vault, and a data lake.
Module 01 - External Databases (Azure SQL DB)
In this video, Manuel shows you how to connect to an Azure SQL Database from a Databricks notebook using two different connectors: JDBC and SQL Server. He explains the syntax and options for each connector, and how to use variables, queries, and Key Vault to make your connection more secure and flexible. He also demonstrates how to read the schema and display the data from the database table or query in a data frame.
Module 02 - Storage Accounts (ADLS)19 min.
This is a tutorial on how to connect to an Azure Data Lake Storage account from a Databricks notebook using different methods. Manuel explains the advantages and disadvantages of using OAuth 2, SAS token, and access key, and shows the code and configuration needed for each option. He also demonstrates how to read a CSV file from a container and display it as a data frame in the notebook.
Module 03 - Connecting to Snowflake11 min.
In this video, Manuel shows you how to connect to a Snowflake database from a Databricks notebook and query data from a table. He explains the syntax and options for using the spark.read.format function and demonstrates how to create an options object to store the connection parameters. He also shows you how to use a query instead of a table name to filter the data.
Module 04 - Connecting to Web Pages24 min.
In this video, Manuel shows you how to connect to a web page and scrape data from it using Python and a library called Beautiful Soup. He explains the steps of installing the library, requesting the page content, parsing the HTML, finding the relevant tags and elements, and creating a Pandas data frame with the extracted data. He uses a Wikipedia page that lists the largest companies in the United States as an example and demonstrates how to get the table of data from the web page.
Module 05 - Azure Key Vault and Secrets27 min.
In this video, Manuel shows you how to use Azure Key Vault to store and retrieve sensitive information such as passwords, connection strings, and access keys for connecting to different data sources from Databricks notebooks. He explains the steps of creating a secret scope, a secret, and granting access to the Databricks service principal. He also demonstrates how to use the DB utils command to get the secret values and use them in the connection options for Azure SQL Database and Azure Data Lake Storage.
Manuel Quintana has been teaching and developing training content for Pragmatic Works for the past 6 years. The areas of knowledge he has worked with includes the Microsoft SQL BI Stack (T-SQL, SSIS, SSAS, and SSRS), the Power Platform (Power BI, Power Automate, and PowerApps), and other various Azure services. Manuel has also authored a couple of books around Power BI and the Power Platform and participates within the community by presenting topics at various SQL Server or Power BI User groups.