Power BI Interview Questions - Data Shaping
Welcome back to another edition of our Tech Interview series where we tackle the Power BI interview. Our goal is to prepare the interviewee and interviewer for questions that may, or should, come up in an interview for a Power BI role. The theme of our questions today is related to data shaping.
We’ll do a bit of role playing of an interview, show demos of the answers and then give a bulleted background about the answers given by Erin Ostrowsky, Training Content Developer with Pragmatic Works.
Question 1: I run a loan company where we do different loans like student loans, credit card loans, home loans, etc. I want to be able to pull together all the different data imports that I get from different loan agencies within my company, who all give me a daily file of complaints that they get about the loans. How can I get multiple files to one location so I can get them into a report? Can I do that with Power BI and what’s the process?
First, I’d like to ask, what format are those files being sent in? (Interviewer: All files are CSV files and all the column names stay the same.) Now that I know that, in this situation I could use the folder option as your data source.
- On the Power BI Desktop, I click on Get Data and the common data sources are listed but Folder is not, so click More to get that Folder option.
- I select Folder and click Connect. Next I need to find the Folder Path; I can either copy/paste it or browse and search for it.
- Once I enter that and click OK, I see my tables of complaints. Now I want to Combine and Edit so I can launch the Power Query Editor. Note: If a new file gets added in, that file would be automatically loaded in, so you’d have the most current value as it’s refreshed.
- Now I’ve got both files I wanted loaded in and new files will be automatically loaded in as they are sent in daily; all in a clean, easy format.
Break down of combining files from folder:
- Power Query automatically creates several query objects to simplify work.
- Apply logic to one file and it automatically applies that to groups of files. Plus, it does a lot of extra logic for you. All we did was have it point to a folder and it had parameters going on and a sample file that it used to determine what the file column names and data types were. Power BI has a great click, drag, low code approach and that hard work is being done for you on the back end.
Question 2: This next question is around data shaping and how I would do things in the query editor. Is there an easy way to move queries that I designed from one solution to another?
There are a couple ways you could approach this. One thing that comes to mind is that you have this M code that is being written as you do your transformations or run your queries. You can simply copy/paste that M code.
- To do this go to the Home ribbon and open the Advanced Editor. When this window opens simply copy and paste the M code. When you’re doing your steps, transformations and applying business rules, the M code is being written in the background and recording all that logic and putting it in the right order and saving it. So, all you need to do is copy and paste it into another solution.
- Another option you have is to create a template with this. You can export your entire report, all your transformations and all that logic as a template. In the Desktop go to File, Export and then hit Power BI Template. You could create a template and hand it off to someone else. A key point is it doesn’t store all the data in the template, but all the metadata (the visual, query). Then when a person goes to open it, if they have access to the same data, they can see the same thing we did.
A third new option is Dataflows which is essentially like a Power Query in the Web type situation. It allows reusable data sourcing and one version of the truth, so depending on your licensing structure that may be a great option.
Question 3: I want to take a solution I’ve created that I’ve connected to my development server and I want to easily switch this to production and take that solution back and forth. Is there an easy way that I’m able to change data sources?
A great option to change the data source is to use parameters. Here’s how:
- The first thing to set up my parameters is to go File and Options & Settings. In Options I click on the Power Query Editor and turn Parameters on. It is not turned on by default.
- Then launch the Power Query Editor and click on Data Source in the Home ribbon.
- From the dialogue box, click Change Source to change your data source. By default, the File Path is set to a text file. To change that I choose New Parameter and give it a name.
- I want to put the path as the Current Value in that field. This would be the location where our data file is or a server or database name. So, paste that path in the value field and click OK. You’ll see the data path field now has a parameter symbol next to it.
- You could also edit that parameter in the report view under the Edit Queries option in the Home bar and pull down to Edit Parameters where you can easily change that data source value.
- So, there are multiple ways to set up the parameter and change the source. If you’re in an interview that would be a great thing to explain or show!
Breakdown of Parameters:
- Parameters make queries and connections dynamic.
- Should be used for modifying data imported NOT for filtering visuals.
Question 4: I work with some bad data. Oftentimes the data I receive is brought to me from my end users. Let’s say I’m working a trade show and I’m scanning badges to get leads. I’m relying on attendees to submit information correctly, which is likely not the case. Is there a way Power BI can clean up bad data?
A fuzzy merge could be a great option in this case. It will work nicely with those slightly misspelled, ambiguous or imperfect data.
- In the desktop I launch the Power Query Editor and I see some bad data with states being misspelled.
- Next I go to Merge Queries in the Home ribbon and pull-down Merge Queries as New.
- I want to use fuzzy matching to perform the merge, so I check off that box. In this case I select the state table because that’s what I’m going to join on and then select state and state province names. Then just click OK.
- You’ll now see that it pulled up a table. Click on the icon in the top right corner and select Geography and State Province Name and hit OK.
- You’ll then see a new column with the correctly spelled state names, so it fixed the bad data.
Question 5: I need a way to extend what the Power Query Editor can do. There are times I run into things that I’m hoping the editor can solve but there’s not a transform already built in. What can I do when there’s not a built-in solution to a problem?
Some options here are to use Python or R. If you or someone on your team has that ability to write some scripts, you can extend the capabilities within the Power Query Editor for things like transforms that are not readily available.
- To begin, go into the Power Query Editor and click on Transform. Under the Transform ribbon you’ll see the icons for R and Python script.
- I'll want to do some predictive analytics as it looks like we’re missing data input on certain days. I want to use R code to take a good guess at what those values should be.
- I put my R script in and run it. Then I can click on the table and see if it worked and, in my case, it did so now I have a complete data set.
- There is a bit of a learning curve with Python and R, but they are quite powerful and can do things we can’t do with Power BI alone.
Breakdown of R and Python:
- Gives you the ability to extend the capabilities of Power BI.
- Can be used in multiple locations in Power BI: Query Source, Query Transforms, Visuals
That wraps up our interview. We have a lot of training available to back up most of what we talked about today. Check out our website or click the link below to learn more about our fantastic On-Demand Learning platform with over 50 deep dive courses and subscriptions available on a yearly or monthly basis. A great way to start learning Power BI is with our FREE Dashboard in a Day course with 7+ hours of Power BI content - click the link below to get your FREE course!
If you have thoughts about other questions we could ask or interview themes you’d like to see, let us know in the comment section below. And be sure to subscribe to our You Tube channel and blogs, so you won’t miss any of our great content!
Sign-up now and get instant access
ABOUT THE AUTHOR
Devin Knight is a Microsoft Data Platform MVP, Microsoft Certified Trainer, and President of Pragmatic Works. He focuses on driving adoption of technology through learning. He is an author of nine Power Platform, Business Intelligence, and SQL Server books. He has been selected as a speaker for conferences like Power Platform Summit, PASS Summit, SQLSaturdays, and Code Camps for many years. Making his home in Jacksonville, FL Devin is a contributing member to several local user groups.
Leave a comment