In today’s post I’m bringing you a live sit down with James Rowland-Jones, Principal Program Manager on the Azure SQL Data Warehouse team, as well as some exciting news. I spent time last week at Microsoft to learn what new things are coming and I’m excited to share that. Dropping today is the new Gen2 release for Azure SQL Data Warehouse.
Gen2 is all about giving more value to customers, with lots of features, performance and concurrency. Here’s a run down of my discussion with James about what Azure Data Warehouse brings to you and your business.
Adam: If I’m a customer thinking of moving my data warehouse to the cloud, why do I want to do it now that there’s Gen2?
James: Gen2 is a major investment for Microsoft and builds upon our separation of compute and storage story that we have for cloud data warehousing. This offers great value, as well as a great entry point for customers wanting to build in this space. Gen2 is broadly deployed within 20 regions in the world – the largest cloud-based deployment of any cloud data warehouse.
This new generation offers you 5X the compute capacity of anything we currently have. This means you could deploy up to 4,000 virtual cores in 5 minutes or less. We hear from customers that they want to be able to use all that compute power in lots of different ways and increase the number of concurrent workloads they have on the system.
As you scale with Gen2, you can increase the concurrent query capacity up to 128 concurrent queries, which is a 4X increase from what we have today. This is great for customers that have mixed workloads.
On top of that, what we’ve done under the hood is increase the kind of aggregate workload through some of the intelligent smarts we’ve added. We’ve seen customer get a 5X increase in their aggregate workload when using this new technology.
Adam: So, not only is it MUCH faster, but you have a lot more flexibility on the kinds of workloads you can run on there. Some people may not understand the importance of the SQL Data Warehouse’s unique concept of separation of compute and storage. Can you explain why this matters?
James: Our basic model for Azure SQL Data Warehouse uses the notion that you can separate what you’re paying for in the compute from the storage layer. So, you store the data that you have and then just pay for the performance. You can scale up to the performance that you need. Rather than pre-buying something in an on-premise world, where you must think about what you’re going to need years into the future, you size it to the workload you’re doing.
You can increase compute power during peak times, such as month end reporting or completely turn off compute over the weekend and you’ll save significant money.
Adam: A big complaint we hear from customers that are still in the appliance world (using Teradata or Netezza for instance), is they must spend lots of money in expansion or on terabytes to get more cores or more storage. Where the cost of storage in the cloud is comparatively inexpensive and being able to scale the compute separately from storage will give them flexibility which matches their business.
James: It’s a very significant and profound impact that you have for a customer with a high compute but low storage requirement or vice versa. And when you do need to make a change, it’s a 60-90 second operation, not a size of data operation.
Think of this, in a legacy model, when compute and storage are tied together, any time you need to change one of those dimensions, you have to resize and it’s a size of data operation. When you’ve got terabytes or more in a data warehouse, if you have to redo that data volume, not only is it a significant operation, but it takes the business offline from an analytics capability perspective.
We’re all about dynamic scaling and sizing right at the point where you need it, giving you that compute power and flexibility.
Adam: That’s pretty cool, as we see these challenges in resizing and downtime even in other cloud data warehouse platforms. You’ve talked about performance, concurrency and the ability to separate compute from storage, how do those new big gains work behind the scenes?
James: One of the things that you do when you separate compute and storage is the memory and the CPUs are close together but you’re basically making a remote IO. That’s great in terms of flexibility, but as you scale and really want best in class performance, you’re going to hit a scalability wall at some point.
So, for our first generation customers, we planned to make sure we’re ready with Gen2 when they need it. Gen2 provides the latest in Azure hardware inside those nodes, with an intelligent disc-based cache that automatically tiers the storage and provides a great performance boost. It eliminates the remote IO for those repetitive queries, which means you’re getting great throughput on the compute, the scan rates are fantastic, and customer are reaping the benefits.
Customers already running SQL Data Warehouse Gen1 can go into the portal and simply upgrade to Gen2. This clear, straightforward process is managed easily by them in the portal.
It’s an exciting time to be in the analytics space right now. We’ve seen customers stretching the boundaries, doing some exciting stuff and transforming their business, as well as modernizing their platforms by moving to Azure. They’re now looking at their on prem data warehouse as a silo, and the cloud is the place where they are driving all that new data – the cloud is where they want to be.
Adam: I agree, it is a super exciting time! We’re seeing patterns like the SQL Data Warehouse being the central hub and then adding Databricks and Cosmos DB and other data technologies that all work together. That’s what Azure is about, one group building things designed to work together, so once you get into the cloud, like with SQL Data Warehouses, it sets you up to accelerate the things you want to do downstream.
James: With Azure as a platform, think about all those engines, Databricks, HDI, etc., using our fundamentals of separating the compute and storage. Store your data and pay for performance using the engine that you want that best aligns to the person that wants to interact with that data. If I’m a data science guy, I want to use Python, R or Spark; Databricks is a best in breed engine for this.
Then if a data scientist wants to provide that data over to a data analyst to consume, it’s simple to move that data transparently into the SQL Data Warehouse without retooling. The data warehouse plugs in, gives you great interactive query and secure access, an important aspect with the impending GDPR.
Adam: How can people find out more about Azure SQL Data Warehouse?
James: Go to the Microsoft portal; there are lots of resources to explore the service, with trials and docs with free tutorials and lessons. Also check out our Azure blog content or learn through our partners, like Pragmatic Works, to get a second view on everything MSFT is doing, and they can help you chart a course for moving into this vibrant world of SQL analytics.
Adam: We’re obviously excited! If you’re not after reading this or watching the video, I don’t know what it will take! But if this did get you excited to learn more about Azure SQL Data Warehouse Gen2 to help drive your business or bring your data strategy together, contact us. We’ll do all we can to point you in the right direction and get you in the cloud.