Sign-up now and get instant access
Leave a comment
Customized training to master new skills and grow your business.
Beginner to advanced classes taught by Microsoft MVPs and Authors.
In-depth boot camps take you from a novice to mastery in less than a week.
Season Learning Pass
Get access to our very best training offerings for successful up-skilling.
Stream Pro Plus
Combine On-Demand Learning platform with face-to-face Virtual Mentoring.
Quick references for when you need a little guidance.
Summaries developed in conjunction with our Learn with the Nerds sessions.
Digital goodies - code samples, student files, and other must have files.
Stay up-to-date on all things Power BI, Power Apps, Microsoft 365 and Azure.
Earn money by driving sales through the Pragmatic Works' Training Affiliate Program.
It's time to address your client's training needs.
Learn how to get into IT with free training and mentorship.
Discover the faces behind our success: Meet our dedicated team
How can we help? Connect with Our Team Today!
Find all the information you’re looking for. We’re happy to help.
Yesterday’s Azure Every Day post covered how Azure Data Factory pricing works. In today’s post I’d like to go a bit deeper into Azure Data Factory Version 2 and review pipelines and activities. In essence, a pipeline is a logical grouping of activities. If you’re familiar with SSIS, think of an SSIS package being a grouping of activities that are happening with the data.
An example of a pipeline would look like: you want to pull data from a website, file server or database up into Azure and do some kind of transformation on that data, then report from it. Within the pipeline, multiple activities can be defined. If there’s no activity dependency on a set of activities – so you have one activity running and there’s no dependency on the next activity -then they can run in parallel.
This is good to keep in mind as you’re performing these activities because you may need to schedule them or figure out a way, so they don’t run in parallel or that one runs after another.
There are 3 main types of activities:
1. Data Movement Activities – This is the sources where you’re pulling in data from such as Azure Blob Storage, Azure Data Lake, Azure DB and DW. You can also set up an on premises gateway and pull in databases, such as commonly used DB2, MySQL, Oracle, SAP, Sybase and Teradata, as well as NoSQL databases like Cassandra and MongoDB.
I also mentioned files; you can pull from Amazon, S3, file systems, FTP, HTTP, etc. You also have the Software as a Service (SaaS) options: Dynamics, HubSpot, Marketo, QuickBooks, and Salesforce, to name a few. You can check a complete list on the Azure online documentation.
2. Data Transformation Activities – Here is where you’re taking your data after it’s ingested into Azure and doing something with it. Some common ones are HDInsight, HIVE, PIG, MapReduce, Hadoop Streaming and Spark transformations. These allow you to transform your big data in your Azure environment and stage it for your reporting.
Other common uses would be machine learning into an Azure VM, as well as stored procedures. You can have your stored procedures in SQL Server defined in Azure, and then run that stored procedure, and also use U-SQL for your Data Lake Analytics.
3. Control Activities – In these activities you can do things like execute your pipelines or run a ForEach statement or Look-up activities, the types of things where you’re controlling how the pipeline is working and interacting with the data.
Hopefully you found this helpful. If you have any questions about any of the topics I covered today or want general information around any Azure topic, we’re here to help. Click the link below or contact us to learn more about integrating Azure into your organization.
Join other Azure, Power Platform and SQL Server pros by subscribing to our blog.