In this tutorial, Zane Goodman from Pragmatic Works explains Microsoft Fabric's new preview feature: Lakehouse Schemas. This feature provides an efficient way to organize, manage, and query data within a Lakehouse, making data workflows simpler and more collaborative. Let's dive into how Lakehouse Schemas work and why they should be part of your data management strategy.
Lakehouse schemas are a powerful new feature within the Microsoft Fabric ecosystem. They act as logical structures that allow users to organize, manage, and query data in a Lakehouse using SQL-like capabilities. Essentially, they bridge the gap between structured and unstructured data, enabling teams to define structured views over raw or semi-structured data stored in a Lakehouse. This makes it easier for analysts, engineers, and business users to query and interact with the data.
To enable Lakehouse schemas, users need to create a Lakehouse in Microsoft Fabric. When setting up a new Lakehouse, they can select the option to enable schemas. Once enabled, a default schema (dbo) will be assigned to all tables unless a specific schema is defined. Users can then add additional schemas (e.g., bronze, silver, gold) for better data organization.
Zane demonstrates how to use the Medallion Architecture to organize data into Bronze, Silver, and Gold layers:
By uploading data into a specific schema (like the Bronze layer) and then moving it through the Silver layer after transformation, teams can organize data effectively while ensuring a smooth flow for analytics and reporting.
Once data is loaded into the appropriate schema, users can create data flows to transform and move the data between schemas. Zane walks through the process of creating a data flow to move the "Holiday" table from the Bronze schema to the Silver schema after cleaning up the data. This allows for efficient data processing and ensures that data is always available in the desired format for reporting or analysis.
While Lakehouse schemas offer many benefits, there are some limitations to be aware of. Currently, table maintenance features are not available for schema-enabled Lakehouses, and users cannot migrate non-schema Lakehouses to schema-based Lakehouses. Additionally, some features like Spark views are not supported. These drawbacks are expected to be addressed as the feature moves beyond its preview phase.
Microsoft Fabric's Lakehouse schemas offer an innovative solution for organizing and querying data efficiently. By leveraging schemas, teams can simplify their data workflows, enhance collaboration, and improve scalability while retaining the flexibility of a Lakehouse. Although the feature is still in preview, it presents a significant opportunity for teams to optimize their data management practices and improve overall performance.
Don't forget to check out the Pragmatic Works' on-demand learning platform for more insightful content and training sessions on Microsoft Fabric and other Microsoft applications. Be sure to subscribe to the Pragmatic Works YouTube channel to stay up-to-date on the latest tips and tricks.