How to identify duplicates in Excel
In the world of duplicates, the definition is crucial. This is because the concept of a duplicate depends on the context of its related data. Duplicates can exist within a single column, across multiple columns, or even in complete records. There is no one-size-fits-all feature or technique that can identify duplicates in every scenario. In this article, I will demonstrate how to find identify duplicates in Microsoft Excel.
Method 1: Filtering for duplicate records in Excel
One of the easiest ways to locate identify duplicates is by using Excel’s advanced filter feature. It provides flexibility and can effectively identify duplicate rows. What you do with the identified duplicates is up to you. For instance, you can use the advanced filter to copy the unique records to a different location, ensuring that you retain the original data and have a separate set of distinct records.
In this particular case, the term “find” can be slightly misleading. This feature doesn’t actually find the duplicates; rather, it filters them from the results, giving you a unique set of records.
Let’s consider a sheet that contains identify duplicate records within a Table object (Figure A).
While it is recommended to use Tables, this feature can also work with regular data ranges. Even in a small sheet, visually identifying duplicates can be a daunting task, and there’s a high chance of missing some duplicates. To temporarily remove duplicates from the dataset, follow these steps:
- Select any cell within the dataset.
- Click the Data tab and then choose Advanced Filter in the Sort & Filter group.
- Select “Copy To Another Location” in the Action section.
- Verify that Excel correctly references the original data in the List Range.
- Specify a copy range in the Copy To control (e.g., H2).
- Check the “Unique Records Only” option (Figure B) and click OK.
Excel will copy a filtered list of unique records (Figure C) to the range you specified in Step 5. At this point, if you wish to remove duplicates, you can replace the original data with the filtered list. However, as a general rule, it is not recommended to delete data, even if you believe you won’t need it again. The decision is ultimately up to you.
One thing to note is that you might not immediately notice that the Commission values in the filtered set are literal values. In the original data, that column may contain expressions. Pay attention to such issues – if you plan to use the filtered set in the future, you need to replace the values with the corresponding expressions, so that new records can calculate the commissions correctly.
Method 2: Formatting duplicate values in Excel
Identifying duplicates within a single column or across multiple columns is slightly more challenging than filtering for entire records. However, you can utilize Excel’s conditional formatting to quickly highlight duplicates in a single column. In this case, deleting duplicates may not be the main objective. Let’s explore formatting duplicate commission values:
- Select the range of cells containing commission values (e.g., F3:F13).
- On the Home tab, click Conditional Formatting in the Styles group.
- Choose New Rule from the dropdown menu.
- In the top pane, select the “Use a formula to determine which cells to format” option.
- In the lower pane, enter the formula “=COUNTIF(F:F, F3)>1” (Note: The period at the end is grammatical and not part of the Excel formula).
- Click the Format button, go to the Font tab, choose Red, and click OK (Figure D).
- Click OK to apply the formatting to the worksheet.
The conditional format will highlight any duplicate values in column F (Figure E). The Excel function COUNTIF() is used to determine
the number of occurrences of each value in column F. In this case, the formula “=COUNTIF(F:F, F3)>1” compares the current commission value to all other commission values in column F and returns True if there is more than one occurrence. If you only want to highlight the duplicate copies and leave the first occurrence unaltered, you can use the formula “=COUNTIF($F$3:$F3, F3)>1” instead.
This conditional formatting rule works well for a single column. But how can we identify duplicate values across multiple columns? For this task, we can use two helper formulas in Excel: one to concatenate the columns we want to compare and another to count the duplicates. Let’s assume we want to find duplicates based on the name and commission. Follow these steps:
Enter the following expression in cell H3 and copy it to the remaining cells:
=Commissions8[@Personnel]&Commissions8[@Commission]
- (Note: The structured referencing is based on using a Table object to store the data. If you’re using a data range, enter =D3&F3.)
Next, in cell I3, enter the following formula and copy it down:
- =IF(COUNTIF(H3:H13,H3)>1,”Duplicate”,””)
Now we have two helper columns in place (Figure F). The Excel IF() function checks if the concatenated values in column H have more than one occurrence using the COUNTIF() function. If it finds duplicates, it returns “Duplicate”.
At this point, you can stop or apply a conditional format based on the formula in column I to highlight duplicates as follows:
- Select the range of cells containing the data (e.g., B3:F13).
- On the Home tab, click Conditional Formatting in the Styles group.
- Choose New Rule from the dropdown menu.
- In the top pane, select the “Use a formula to determine which cells to format” option.
- In the lower pane, enter the formula =$I3=”Duplicate”.
- Click the Format button, go to the Font tab, choose Red, and click OK (Figure G).
- Click OK to apply the formatting to the worksheet.
The conditional formatting rule will highlight the first occurrence of a duplicate in both the Name and Commission columns (Figure H). Since the rule formats the entire record, users might assume that the entire record is a duplicate, so it may require some explanation. In the second helper formula, you can display “DuplicateNameCommission” instead of just “Duplicate” if you want to provide more specific information, but that might be excessive.
None of these techniques can handle all types of duplicate data. The method you choose will depend on the specific situation, whether you’re comparing entire records, single columns, or multiple columns. There are other ways to find duplicates, but these methods are quick and straightforward.