Related Articles
Newsletter
Join our blog
Join other Azure, Power Platform and SQL Server pros by subscribing to our blog.


-1.png)
Start with the FREE community plan and get your lifetime access to 20+ courses. Get Instant Access Now!
Need help? Talk to an expert: (904) 638-5743
Private Classes
Private deliveries of courses for groups
On-Demand Learning
Beginner to advanced classes taught by Microsoft MVPs and Authors.
Bootcamps
In-depth boot camps take you from a novice to mastery in less than a week.
Season Learning Pass
Get access to our very best training offerings for successful up-skilling.
Stream Pro Plus
Combine On-Demand Learning platform with face-to-face Virtual Mentoring.
Certification Training
Prepare and ace your next certification with CertXP.
Cheat Sheets
Quick references for when you need a little guidance.
Prag Guides
Explore our knowledge base for quick tips on syntax, functions, and more!
Downloads
Digital goodies - code samples, student files, and other must have files.
Blog
Stay up-to-date on all things Power BI, Power Apps, Microsoft 365 and Azure.
Community Discord Server
Start here for technology questions to get answers from the community.
Career Guides
Breaking into the field? Let these guides help get you started with a plan.
Nerd Guides
Summaries developed in conjunction with our Learn with the Nerds sessions.
Quickstarts
Hands-on training with expert-led collaborative development.
Private Training
Personalized approach for your specific training requirements
Hackathons
Use your own data to take your team's skills to the next level.
Virtual mentoring
Get there faster with your personal trainer.
Enablement
Comprehensive enterprise enablement training for your team.
Admin Hackathon
Tame your power platform environment.
In this post I’d like to share some knowledge based on recent experiences when it comes to performance of Azure Data Factory when we are loading data from Azure Data Lake into a database; more specifically in using the Copy Activity.
What I’m talking about here comes down to the difference of loading data one file at a time vs loading an entire set of files in a folder. The screenshot below shows a typical pattern that we use where we would start off by getting a list of files that we want to load. So, we have a couple tables behind here telling us which files are available and then a list of those files that may have already been loaded to our target.
This other screenshot is a typical pattern we would do for each of those files. I’ve got a stored procedure that puts an entry into a table that says I’ve started this. We run the Copy Activity there and then we record whether it succeeded or failed at the end.
If you’re coming from an SSIS background, the idea of using the ForEach Loop is a powerful technique and it’s not a big deal to loop through 100s of files.
But in Azure Data Factory, the story is a bit different. Each one of the tasks that we see here, even the logging, starting, copy and completion tasks, in Data Factory requires some start up effort. So, the mechanism that’s used behind the scenes is quite different; it must provision resources behind the scenes and the process of initiating these tasks can take some time.
If you’re dealing with a long list of files, you’re going to run into some severe performance problems. This being said, we’ve shifted our approach recently in many cases, away from loading data file by file, but instead pointing it to a folder. If the files in that folder all have identical or compatible structure with your Copy Activity, we can copy all those files at once, rather than in a loop.
In that case, our logic changes as far as how we keep track of those files or folders that we have/have not loaded, but in the end, making that change will provide you some tremendous performance gains.
In summary, we’re shifting more to patterns where we load data from files in a folder and then maybe loop through a smaller list of folders if needed and moving away from patterns where we process things one file at a time.
As with many things, how you make that decision will vary depending on several factors. For us, it came down to the number of files that we were processing which would take too long to loop through, so we preferred to load by folder.
If you have questions about Azure Data Factory, data warehousing or anything Azure related, we’re here to help. Click the link below or contact us–our Azure experts are here to help no matter where you are on your Azure journey.
ABOUT THE AUTHOR
Free Trial
private training
Newsletter
Join other Azure, Power Platform and SQL Server pros by subscribing to our blog.
Leave a comment