How to Run a Databricks Notebook Using Azure Data Factory

Written by Mike Donnelly | Sep 04, 2020

In today’s installment in our Azure Databricks mini-series, I’ll cover running a Databricks notebook using Azure Data Factory (ADF). With Databricks, you can run notebooks using different contexts; in my example, I’ll be using Python.

To show how this works, I’ll do a simple Databricks notebook run: I have a file on Azure Storage, and I’ll read it into Databricks using Spark and then transform the data. Let’s say you have something you’re running, and you want to run it in an automated fashion, Databricks is a great way to incorporate that. If the end result is something you want to start productionizing, you can pull that into your ETL pipelines and run that notebook as part of your ETL solution.

Some set up for my demo:

We need to have an Azure Storage Account and to connect Databricks to this, we’ll need an Access Key which we’ll store in Azure Key Vault.
For our Databricks workspace, we’re going to connect a Secret Scope to the Key Vault (a Preview feature) and mount that to an Azure Blob Storage container in Databricks using the Databricks file system.
We will have an Azure Data Factory resource set up with the linked service to the Databricks workspace.

Once that is set up, my demo will show you how to create and run an ADF pipeline with a Databricks Notebook task. To see how easy this is, I’ll walk you through it in my brief demo below.

If you want to discuss how to leverage Azure Databricks in your organization or have questions about any Azure product or service or Azure in general, reach out to us. Our Azure experts are here to help no matter where you are in your Azure journey, from questions to roadmaps to implementation, we’ve got you covered.

View full post