Pragmatic Works Nerd News

Intro to Azure Data Lake Store Gen 2

Written by Bob Rubocki | Dec 08, 2018

I’m excited to tell you more about the preview of the second generation of Azure Data Lake Store. One reason I’m excited about this preview is we often get asked about whether to use Data Lake Store or Blob storage for storing files – maybe in a data warehouse load scenario for instance where we use file storage as part of the pattern.

As with many things, there are pros/cons and reasons why you would choose Data Lake Store or Blob storage. Here’s the great thing – Azure Data Lake Store Gen 2 is like a combination of all the things from both.

Look at the table below which is a brief summary of some of the highlights. To point out some differences of Blob storage and Data Lake Store, in Blob storage we have the option of hot and cold storage which was not available in Data Lake Store. Also, redundant storage was part of the product offering with Blob storage, but you did not have that native redundant storage in Data Lake.

On the other hand, if you wanted to use Active Directory principles to manage security on your files, you could do that with Data Lake but not with Blob storage. Data Lake Store was an HDFS compatible storage mechanism (it’s really running Hadoop under the covers), so if you wanted to use Data Lake as the backend for your Hadoop storage, you could do so; you could not do this with Blob storage.

Now look at the last column in my table as it holds the great news. You can see that all these features are now available in Azure Data Lake Gen 2 as the idea was to converge a lot of these features that were previously available in either one or the other.

A bit about setting up your account for Gen 2. While it’s called Data Lake Store, what you’re doing is setting up a storage account. In this next screenshot, you’ll see my Azure portal and how I would set this up by clicking Create a New Resource. Then I searched the term ‘storage’ and I got a list of options.

Notice that Data Lake Store Gen 1 is still an option, but there’s no option for Gen 2. What you need to do is select ‘Storage Account’ as you would have done in the past, which will then take you to this page below.

If I look in the Basics tab of setting up my account, I have the option of selecting an account kind and here is where I can select to use Storage V2 (Data Lake Store Gen 2). Remember, this is all creating a storage account, we’re not creating a Data Lake Store account. A bit confusing, hence why I wanted to show this.

Another option we have on the Advanced tab (see below) for this type of account is an option for enabling hierarchical name space. Without getting too deep here, I’ll just say if you’re using Data Lake Store and you want to replicate similar functionality on the Gen 2, you’ll want to select this option. This has to do with how you access files within folders and such.

Azure Data Lake Gen 2 is a great announcement from Microsoft; it’s been in preview a few months and I’m not sure when it will be GA. But it is exciting to now have the convergence of Blob storage and Data Lake with a single product.

If you have questions about Data Lake Storage, Blob storage, Data Warehousing or anything about the cloud platform in Azure, we’re the people to talk to. Click the link below or contact us, we’re all about data and Azure and we’re here to help you with every step of your cloud journey.