How to extract data in Microsoft Excel
When dealing with imported or foreign data in Microsoft Excel, it is often necessary to clean the data before it can be used effectively. Tasks such as removing leading or trailing spaces, converting text to numbers (or vice versa), and extracting specific values from columns are common data parsing requirements. One typical scenario involves parsing names that are stored in a single column, making it challenging to work with them. In this tutorial, I will demonstrate two quick methods for parsing data using Excel’s Flash Fill and Power Query features.
Note that I am using Microsoft 365 on a 64-bit Windows 10 system. String functions can be used in older versions, including the .xls menu format. Flash Fill is available in Excel 2013 and later versions, while Power Query is available in Excel 2010 and later versions. Additionally, Flash Fill is supported in Excel for the web, and you can open existing Power Query queries in Excel for the web, although creating them in the browser edition is not possible.
How to parse data using Microsoft Excel string functions
Before the introduction of Flash Fill, data parsing often involved combining various string functions like RIGHT(), LEFT(), MID(), FIND(), and LEN(). For instance, consider the expression in column D shown in Figure A:
=IFERROR(RIGHT(B3,LEN(B3)-IFERROR(FIND(” “,B3,FIND(” “,B3)+1),FIND(” “,B3))),B3)
The above expression extracts the last name from the values in column B. The IFERROR() function handles the case when there is no second space in the name. Although our name values have a consistent format, this expression can still be complex and time-consuming to work with, especially in more complicated cases.
Figure A: Parsing data using string functions
Fortunately, newer versions of Microsoft Excel provide the Flash Fill feature, which can simplify many parsing tasks.
How to utilize Flash Fill and Power Query to parse data in Microsoft Excel
Flash Fill:
If you are familiar with Excel’s Flash Fill feature, you can use it to parse data. Start by entering the first value in the target column so that Excel can evaluate the pattern. In this case, since the data is consistent, the parsing task will be straightforward. Follow these steps:
Enter “Harkins” in cell C3 and press Ctrl + Enter. Flash Fill may not fill the entire column at this point.
Enter a few characters from the next last name, such as “Smyth,” as shown in Figure B. Flash Fill will recognize the pattern and offer to fill the remaining column.
Press Enter to fill the remaining cells, as shown in Figure C.
Flash Fill is a fast and convenient method for data parsing, especially when dealing with consistent data patterns. Note that Flash Fill is also supported in Excel for the web. Instead of using Ctrl + Enter, you can find the Flash Fill option in the Data Tools group on the Data tab.
Power Query:
Power Query is available for Excel 2010 and later versions and provides powerful data retrieval and cleaning capabilities. You can use Power Query to add a new column and parse last names into that column. Follow these steps:
Load the data into Power Query:
Click inside the table you want to load into Power Query.
Go to the Data tab and click From Table/Range in the Get & Transform Data group.
If your data is not already formatted as a table, Power Query will prompt you to do so.
Once the data is loaded into Power Query, it will be displayed as a table, as shown in Figure D.
Add a column based on the Personnel column:
Select the Personnel column by clicking its header.
Go to the Add Column tab and choose “From Selection” from the Column From Examples dropdown in the General group (Figure E).
Power Query will add a new blank column, and you can enter the last name values for this column using Power Query’s own version of Flash Fill.
Populate the new column with last names:
Enter “Harkins” in the first cell of the new column (Column 1, Figure F).
Power Query will recognize the pattern and offer a set of last name values for the entire column.
Review the values, and if needed, enter another last name to adjust the pattern.
Save the modified data to Excel:
Go to the Home tab and click “Close and Load” from the Close group’s dropdown menu.
Power Query will save the parsed last names in a new Excel sheet, using the name of the original table (TablePersonnel in this example, Figure H).
You can rename the new column at this point or make any other necessary adjustments before saving the data.
While Flash Fill is quicker, understanding the Power Query method can be helpful when you are already working with data in Power Query and need advanced data transformation capabilities.