Pragmatic Works Nerd News

PySpark in Microsoft Fabric - Delta Transactions and Maintenance (Ep. 3)

Written by Austin Libal | Apr 30, 2026

In this in-depth tutorial, Austin Libal from Pragmatic Works continues his PySpark series, focusing on Delta transactions and maintenance in Microsoft Fabric. This session explores key concepts like Delta tables, Lakehouse architecture, and how to manage data within a Fabric workspace using PySpark. If you're working with data engineering in Microsoft Fabric, this guide provides practical insights and step-by-step demonstrations.

 

 

What Is a Delta Table?

A Delta Table in Microsoft Fabric is a key feature that combines the power of a data lake with the structured capabilities of a database table. Built on top of the Parquet file format, it offers ACID transactions, making data manipulation and version control easier for data engineers.

Setting Up the Environment

Austin begins by walking through the setup process for working with PySpark in Microsoft Fabric:

  • Switching to the Data Science Persona in the Fabric workspace.
  • Importing a Jupyter Notebook file (.ipynb) containing the necessary code.
  • Creating a Lakehouse and establishing a data connection.
  • Loading sample employee data into a Delta Table using a PySpark dataframe.

Exploring Delta Table Transactions

Delta tables provide robust support for handling data changes and tracking modifications over time. Austin covers essential operations, including:

  1. Creating a Delta Table: Writing data to a Lakehouse in Delta format.
  2. Viewing Underlying Files: Exploring Parquet and Delta log files that manage version control.
  3. Performing Delete Operations: Demonstrating how deletions are tracked in the Delta log for version history.
  4. Updating Data: Running an update query to modify records and tracking the change history.
  5. Time Travel: Using the versionAsOf feature to query historical versions of the Delta Table.

Maintaining Delta Tables

Austin emphasizes the importance of regular maintenance for optimal performance in Fabric:

  • Checkpoint Files: Every 10 transactions, Fabric creates a checkpoint file for faster data reads.
  • Vacuum Operations: A cleanup command that removes old, unused data files past the retention period.
  • Optimize Command: Compacts multiple smaller files into fewer, larger files for better performance.

Delta Maintenance from the UI

Microsoft Fabric simplifies Delta table management with a user-friendly interface. Without writing code, you can:

  • Run Optimize and Vacuum commands directly from the Lakehouse UI.
  • Set retention thresholds for automatic data cleanup.
  • Enable V-Order Optimization for faster data reads in Power BI reports.

Best Practices for Working with Delta Tables

Austin shares several best practices for managing Delta tables effectively:

  • Use Fixed Schema to avoid breaking relationships when data changes.
  • Leverage the Time Travel feature for version control and auditing.
  • Run Optimize regularly to maintain query performance.
  • Use Vacuum carefully to avoid accidental data loss.

Conclusion

Delta tables in Microsoft Fabric offer a powerful combination of structured storage, version control, and performance optimization. Austin’s tutorial provides a clear guide for setting up and maintaining Delta tables using PySpark, making it easier for data engineers to work with large datasets efficiently. 

Don't forget to check out the Pragmatic Works' on-demand learning platform for more insightful content and training sessions on Fabric and other Microsoft applications. Be sure to subscribe to the Pragmatic Works YouTube channel to stay up-to-date on the latest tips and tricks.