Master Delta Lake in Microsoft Fabric with Austin Libal. Learn to manage big data using Lake house architecture, create Delta tables, and use features like transaction logs, time travel, and schema evolution.
Embark on a journey to proficiently manage big data using Delta Lake within the Microsoft Fabric ecosystem. Guided by data engineer Austin Libal from Pragmatic Works, this course explores how Delta Lake integrates the flexibility of data lakes with the performance of data warehouses through the innovative lakehouse architecture. You'll learn to address big data challenges—volume, velocity, variety, and veracity—by leveraging Delta Lake's features like transaction logs, ACID compliance, and time travel for accessing historical data versions.
Through practical modules, discover multiple methods to create and integrate Delta tables, dynamically evolve schemas without rewriting entire tables, and optimize data management with partitioning, vacuuming, and performance tuning techniques. Understand the nuances between "overwrite" and "append" modes for data ingestion, and how to maintain data integrity and efficiency. By course end, you'll be equipped to leverage Delta Lake in Microsoft Fabric to build robust, scalable, and efficient data engineering solutions.
Course Outline ( Free Preview)
Delta Lake in Microsoft Fabric - What You Need to Get Started
Module 00 - Introduction
In this introductory video, Austin Libal, a data engineer at Pragmatic Works, guides students through the fundamentals of working with Delta Lake within the Microsoft Fabric ecosystem. He explains that Delta Lake is the default storage format for lake house and warehouse tables in Microsoft Fabric, emphasizing its importance in data engineering. Key topics covered include understanding Delta tables, data lakes, data warehouses, and practical operations like appending, overwriting, and optimizing Delta files.
Module 01 - What is Delta?
In this module of the Delta Lake in Microsoft Fabric course, Austin explains the fundamentals of Delta Lake, emphasizing its importance in managing data with integrity and efficiency within the Microsoft Fabric ecosystem. Key concepts covered include the lake house architecture, which combines the benefits of data lakes and data warehouses, and the use of Delta Lake for storing data in versioned Parquet files to support CRUD operations and time travel. Students will learn how to create and query Delta tables using Spark, gaining a solid foundation in handling big data analytics with Microsoft Fabric.
Module 02 - Delta Files27 min.
In this module, Austin explains the importance of Delta Lake in managing big data scenarios by integrating the benefits of data lakes and traditional data warehouses. He highlights the four Vs of big data—volume, velocity, variety, and veracity—and demonstrates how Delta Lake's flexible schema and open format enhance data reliability and accessibility. Key concepts include Parquet files, Delta Log, and the creation and management of Delta tables within Microsoft Fabric.
Module 03 - Creating Delta Tables22 min.
In this module, Austin explains the various methods to create a Delta table within the Microsoft Fabric ecosystem, emphasizing four primary techniques: drag-and-drop, Data Flow Gen 2, copy data activity within a pipeline, and using a Spark notebook. He highlights the importance of choosing the method that best fits specific needs, noting that while drag-and-drop is suitable for one-time operations, more automated and scalable solutions like Data Flow Gen 2 and Spark notebooks are preferable for ongoing data management.
Module 04 - Fabric Shortcuts21 min.
In this module, Austin explains how to integrate pre-existing Delta tables from various sources like Azure Data Lake, Amazon S3, or Google Cloud into the Microsoft Fabric ecosystem using shortcuts. He demonstrates creating shortcuts to both files and tables folders, emphasizing the importance of Delta format for tables and the flexibility of integrating data without duplication.
Module 05 - Delta Log Transactions35 min.
In this module, Austin delves into the transaction log of Delta tables within the Delta Lake and Microsoft's Fabric course, emphasizing its role in tracking all changes to a delta table. He explains the importance of ACID properties—atomicity, consistency, isolation, and durability—in ensuring data reliability and fault tolerance, and introduces key components like Parquet files and JSON log files. Students will learn about various operations such as inserts, updates, deletes, and merges, and how these are logged and managed to maintain the integrity and performance of the data lake.
Module 06 - Time Travel17 min.
In this module, Austin introduces the concept of time travel in Delta Lake within Microsoft Fabric, highlighting its utility for accessing historical data versions of Delta tables. He explains how time travel can be performed using either version control numbers or specific timestamps, allowing for historical analysis, auditing, and data recovery. Austin also demonstrates the practical application of these features through a hands-on example, emphasizing the importance of maintaining data files for effective time travel operations.
Module 07 - Schema Evolution11 min.
In this module, Austin explains the concept of schema evolution in Delta Lake within Microsoft Fabric, emphasizing its importance in dynamically modifying table structures without rewriting the entire table. Key features discussed include the ability to add, rename, or remove columns and change data types, ensuring backward compatibility with existing data. Austin also highlights two types of schema evolution operations: merge schema, which adds new columns while preserving the existing schema, and overwrite schema, which replaces the entire schema with a new data frame.
Module 08 - Append vs Overwrite18 min.
In this module, Austin explains the differences between using the "overwrite" and "append" modes when writing Delta tables in Microsoft Fabric. He highlights that "append" mode adds new records to the existing data, preserving historical records, while "overwrite" mode replaces the existing data entirely with new data, which is useful for full data refreshes or schema changes. Austin also demonstrates practical examples in a Fabric notebook, emphasizing the importance of choosing the appropriate mode based on specific use cases like incremental loads or performance optimization.
Module 09 - Partitioning16 min.
In this module, Austin explains the concept of partitioning in Delta Lake within Microsoft Fabric, emphasizing its role in optimizing query performance and data management. He highlights the benefits of partitioning, such as selective scanning and parallel processing, while cautioning against over-partitioning, which can negatively impact performance.
Module 10 - Table Maintenance11 min.
In this module of the Delta Lake in Microsoft Fabric course, Austin explains how to maintain delta tables for optimal performance through vacuuming and optimizing. Vacuuming helps remove old data no longer needed by the current state of the Delta table, while optimizing consolidates smaller Parquet files into larger ones for faster reads.
Module 11 - Class Wrap Up1 min.
In this video, Austin from the Pragmatic Works team provides an overview of working with Delta Lake and Microsoft Fabric, emphasizing the importance of understanding the intricacies of these technologies. He encourages students to provide feedback and suggests potential topics for future courses, such as Notebooks, Delta Lake, and ETL processes.
Austin is a Jacksonville native who graduated from The Baptist College of Florida in 2012. He previously worked as a manager in the retail service industry. He enjoys spending time with his wife and two kids. His primary focus at Pragmatic Works is on Azure Synapse Analytics and teaching the best practices for data integration, enterprise data warehousing, and big data analytics. He also enjoys helping customers learn the ins and outs of Power BI and showing people new ways to grow their business with the Power Platform.