We all know life can get hectic. Here at Pragmatic Works, we're no different. But one of our goals is to learn something new about Azure every day, as things are constantly changing and being updated. Many people are still learning all the amazing things they can do within the Azure cloud and we want to help. Our posts in our Azure Every Day series are a great way to learn more about Azure each week.
There are many options for data storage, how do you know which is right for your data? Today I’d like to discuss storage in relation to the architecture of the modern data warehouse and to shed some light on your options.
The modern data warehouse can come in different forms: a relational warehouse which is stored as a traditional star or snowflake schema, a hub and spoke model or a data vault model. In these different schemas we’re storing a lot of data within and we can then derive data marts from the larger data warehouse for reporting and analytics.
Or it could be a data lake and using cloud-based, more modern storage mechanisms that serve as a trusted data repository. But if we’re not careful, a data lake can be a data junkyard. Look at the image below about the Microsoft modern data warehouse platform:
On the left side we have the simplest form, what we call a low or no code business analytics solution. Here we use the Power BI platform where we store data using data flows and the common data model, which stores data in the Azure Data Lake Storage Gen2.
Moving to the right you see that we can use transformation and orchestration tools like Azure Data Factory and Azure Databricks. Then we can store that data in Azure SQL Database or Azure SQL Data Warehouse. Azure Machine Learning and services available in the cloud like cognitive services can then be used for compute and analytics. This is all in addition to raw data that we would ingest and stage and possibly archive.
So, why used cloud-based storage instead of traditional on prem physical storage?
Here are some cloud storage options:
How cost effective is cloud based storage? Here’s some quick math; if I have a terabyte of data that I need to archive and store offline, I could go to Costco or Amazon and buy an inexpensive consumer grade 1TB external hard disk for about $50. Now compare that $50 disk drive to the cost of the least expensive Azure storage with local redundancy, over two years that will cost me about half of what that external disk is going to cost.
And for redundancy for my external disk that will cost me $200-300 for a RAID device that has 1 TB of storage which is 5-6 times more that what it would cost me to store my data in Azure.
Here’s a breakdown of Azure storage history/concepts:
Data transformation options include, Azure Data Factory and Logic Apps. Azure Data Factory encompasses many different transformation tools, including SSIS. You can now lift and shift SSIS packages into the cloud incorporated with Azure Data Factory.
You also have Azure Databricks with is based on Apache Spark in big data technologies adopted by Microsoft. Recently announced and coming in a few months we will have the ability to use Power Query within Azure Data Factory.
Hopefully this post helped you better understand your cloud storage options. Whether you’re architecting a database solution, an application, data warehouse or data analytics solutions, you have a lot of great options in the Azure ecosystem to store, ingest and manage your data.
If you want more information about storage options in the cloud, Azure Data Factory, Logic Apps or any Azure product or service, you’re in the right place. Our experts and MVPs can help you take your business from good to great, no matter where you are on your cloud journey.