In last week’s blog, I wrote and did a video about how to remove duplicate records and keep the most recent entry as long as a date column was part of the data source. I came across the scenario while giving training on Power BI with my company Pragmatic Works. See the video below:
This week, while doing another two-day training I came across a different scenario from a follow-up conversation from day 1. I had explained how to remove duplicate records and one of the students started working on a Power BI project she has for her company. On day 2 the student informed me that her remove duplicates step was not working. I said that is odd and I asked to see the data. In one of her table visuals, I could see that it appeared that a few of the records had duplicates based on the name column. After further investigation though, we figured out the culprit.
She had done all the steps correctly, but it was a data integrity issue. In her data source, the person in charge of entering the data had mistakenly typed a space for a few of the records before entering the name value. Thus, when Power BI looked for a duplicate in the name column it did its job correctly because “Jamie” is unique compared to “ Jamie”. That leading white space was the culprit. Not only did it affect her Remove Duplicates step, but it would also have affected a merging of two queries if she had included that as part of her report.
For this week’s video, I want to show how to use the Trim function in the Power Query Editor as a best practice to avoid these unwanted outcomes for any future merges or removing of duplicates.