As we’re building a data warehouse for clients, sometimes we’re asked about the need to build a star schema. With all the great analytical tools for querying and doing machine learning on top of data store and files that all work well, people think why do I need to build a star schema, or do I even need one at all?
The short answer is no you don’t need one but there are still many cases where having a star schema is extremely valuable and I don’t see this completely going away. Sure, I see more and more analysis being done on data sets that are not star schema and that’s great. But a well-designed star schema provides ease-of-use benefits to reporting analysts that cannot be discounted. Let me tell you a couple reasons why you would want to build one.
My first reason is purely non-technical. The premise of star schema comes down to two things – query performance and their ease of consumption. In the case of query performance, if you’ve got your star schema loaded on a database like SQL Server, a well-designed star schema will provide a very good query performance.
But if query performance is not a concern, the other key aspect of star schema is they’re very easy for report developers or builders and query writers to consume. It’s a simple, easy to use model that’s generally very intuitive with few relationships between tables. So, if you want your data to be more accessible to a wider audience, then star schema facilitates that.
Next is something I frequently hear when working with people using tools like Power BI for example. People are using Power BI to pull all their data into one big data set and can do whatever they need to by using that tool. In the case that you have one large data set with all the info you need and can do your modeling in Power BI, then you don’t really need a star schema.
Where star schema will help is when you are integrating data from more than one source or two different fact tables that have different granularity. Let’s say you want to see information from one very large fact table and either relate it on a report or compare it to info on another fact table, perhaps budget data on one table and actual transactions on another table. Creating a single, combined data set that has both budget data and detail cost transactions is not impossible, but is surely challenging.
But if we have a star schema model and then use a tool like Power BI, it’s very simple and powerful to write reports that pull data from 2 or more different fact tables without having to relate those two tables together. As long as my fact tables have dimensions in common, I can very easily create a report showing measures from all three fact tables WITHOUT relating the three fact tables to each other directly. If I left my data in three separate data sets, as might happen in an OLTP scenario, it is challenging (and ugly!) to relate these to each other for reporting.
My point here is in some cases, using a star schema would benefit you, so something to keep in mind. If you have questions about data warehousing in Azure, star schemas, ETL or cloud data platform, we are the people to speak with. Click the link below or contact us, we’re here to help no matter where you are on your cloud journey.