Pragmatic Works Nerd News

Analytics Engineers Develop a Fabric Data Lakehouse in 15 Minutes - DevDash

Written by Amelia Roberts | May 04, 2026

Welcome to another exciting episode of Dev Dash, where Pragmatic Works’ data engineering team pits two experts against each other in a race to solve real-world data challenges. In this episode, Austin and Manuel faced off to transform data from an Azure Data Lake into a Microsoft Fabric Lakehouse while addressing performance and data curation requirements. The 15-minute showdown highlights two approaches: leveraging Spark Notebooks for speed and flexibility versus using pipelines and data flows for ease of use and familiarity.

 

 

The Challenge

The challenge involved:

  • Moving six tables from an Azure Data Lake into a Fabric Lakehouse.
  • Cleaning and transforming one table, including removing redundant columns and renaming attributes for clarity.
  • Optimizing performance and demonstrating the best practices in Microsoft Fabric.

Approaches

1. Austin’s Spark Notebook Approach

Austin opted for a Spark Notebook strategy using PySpark and Spark SQL. His method emphasized speed and flexibility by leveraging Spark’s massive parallel processing capabilities.

  • Created shortcuts to the data lake for seamless access.
  • Used PySpark to transform and write data directly into Delta tables in the Lakehouse.
  • Demonstrated advanced techniques, including aliasing columns and filtering rows.
  • Completed the transformation task in 2 minutes and 43 seconds.
2. Manuel’s Pipeline and Data Flow Approach

Manuel utilized pipelines and data flows within Power BI, emphasizing a user-friendly, GUI-driven approach ideal for simpler scenarios.

  • Designed a parent-child pipeline structure for dynamic data movement.
  • Used Power Query for intuitive data transformation of the customer table.
  • Handled the transformation effectively but required more time due to compute constraints.
  • Fact table processing took 10 minutes and 30 seconds.

Lessons Learned

  • Spark Notebooks: Ideal for handling large datasets and high-performance needs.
  • Pipelines and Data Flows: Best for smaller datasets or when working within familiar GUI environments.
  • Flexibility vs. Ease of Use: Spark provides advanced control and speed, while Power BI tools offer accessibility for users less familiar with coding.
  • Fabric Tools Diversity: Microsoft Fabric enables a range of tools, from code-heavy Spark Notebooks to GUI-driven data pipelines, catering to varying expertise levels.

Outcome

While both contestants successfully moved and transformed the data, Austin's Spark Notebook solution outpaced Manuel’s pipeline and data flow approach, demonstrating the power of Spark for high-volume and time-sensitive data processing tasks.

Conclusion

This episode of Dev Dash showcased the versatility of Microsoft Fabric and how different tools can be applied to achieve the same goal. Whether you prefer Spark Notebooks or pipelines, the right choice depends on your expertise, dataset size, and performance requirements. Check out Cert XP for more guidance on mastering Microsoft Fabric and acing the DP-600 Fabric Analytics Engineer certification exam.

Don't forget to check out the Pragmatic Works' on-demand learning platform for more insightful content and training sessions on Fabric and other Microsoft applications. Be sure to subscribe to the Pragmatic Works YouTube channel to stay up-to-date on the latest tips and tricks.