Introduction to Azure Data Lake

Written by Brad Gall | May 07, 2018

At Pragmatic Works we talk to a lot of customers about their data strategies, specifically their data cloud strategies. One great tool we utilize in our toolbox is Azure Data Lake and in today’s post, I’d like to introduce that tool and tell you about some benefits you will gain.

Azure Data Lake is Microsoft’s Platform as a Service (PaaS) big data solution running on Azure. This gives you the ability to handle large volumes of data, as well as unstructured data, such as CSV, flat or log files; these can all be processed through the Azure Data Lake service.

Azure Data Lake consists of two different resources within Azure:

Azure Data Lake Store – This is where the data resides. It’s a fully HDFS compliant file system that you can spin up and have it run on its own. One benefit is that it’s Azure Active Directory integrated, so we can secure our data and our hierarchies within the file structures we set up in Azure Active Directory.
Azure Data Lake Analytics – This is the compute piece of the big data solution. With this, you can take advantage of the common theme of Azure with the separation of storage and compute. This is where we process jobs and data and we do our transformation on our data. We create our views here, run scripts to pull data into new files and migrate our data around.

A benefit of running Azure Data Lake Analytics vs some of the other big data platforms, is that it uses a language called U-SQL, which is proprietary to Microsoft. This language is based off T-SQL (I call it a mash-up of T-SQL and C#). We utilize many of the functions and syntax that we use in C#, but we use it in the context of a T-SQL statement.

The benefit lies in the fact that we don’t have to learn some of the languages that are common to open source data platforms, such as PIG, HIVE, Spark or Python. We can take advantage of some big data capabilities and run them with some of the skill sets we already have in-house.

We help many of our over 7,000 customers by teaching them how to integrate Azure Data Lake into their overall data architectures and figuring out where big data may fit into their data strategy. If you’d like to learn more about integrating this in your business or if you have questions about anything Azure related, we are the people to talk to. Click the link below or contact us – we’d love to help.

View full post