How to clean up scraped data in Excel without VBA

How to clean up scraped data in Excel without VBA

Before you use your data for reports, analytics or presentation, first you need to clean it as you get data from various sources. Here is the solution for “how to clean up scraped data in Excel without VBA”.

In this Excel tip, you’ll learn to clean up scraped data in Excel using Power query

If you do not have Power Query installed in Excel version then click the below link to download it from Microsoft website

How to Download Power Query for Excel 

How to install Power Query in Excel

Get data from other sources

Power Query enables you to Get Data from almost all the sources

Some of them are

  • Excel/CSV File
  • Text File
  • Folder
  • Web
  • MySQL or SQL Server
  • Azure
  • Sharepoint online list
Get Data Power Query Excel

The best part of Power Query is, you don’t have to repeat the task. It has the capability to remember the steps you performed. You just need to click “Refresh All” when you work on your data next time, similar to recording a Macro in Excel. 

How to Unpivot a Data in Excel using Power Query 

Let’s assume you have copied one data/table from another source.  A Grocery Store sales data for 12 months, something similar to below image and you want to “Unpivot” it.

Sample Data

Data for Power Query

#Step 1 – Convert Data to a Table – Go to Insert>>Table or Press Ctrl+T+Enter from your keyboard

#Step 2Go to Data>>From Table Range

Get Table Range for Power Query

#Step 3 – Select the columns which you want to “Unpivot” ( in this is this example it is Column “B” to Column “M”) then right click and select “Unpivot Columns”

Power Query Unpivot Table

#Step 4 – Go to File >> Click on “Close and Load”

Power Query Close and Load

#Step 5 – Converted data will be loaded to the new excel sheet

#Tip – Next time you just need to update the data in the table. Excel will “Unpivot” the table for you once you click on “Refresh All” under the Data tab.

It is not necessary to get the data from Table only, you can get the data from Excel file or any other source as well. 

How to Replace and Split Multiple Values in a Column or Cell in Excel using Power Query 

Sometimes you need to Replace and Split values in the data set that has unwanted characters in a particular cell or column. Generally, you use Flash Fill, Text to Columns, apply text formula or write a VBA code. But you might not aware there is an easy way to perform this task and most importantly, you don’t need to repeat the task again.

Let’s assume you are working on some data and it has a particular column which contains numeric, text and special characters altogether. And you want to extract only a specified value.

Look at the below data

You might be having similar data which you want to clean but you don’t want to apply the nested formulas or VBA

Sample Data

Raw Data to clean using Power Query

You can simply use Power Query to clean the data in no time.

#Step 1 – Convert Data to a Table – Go to Insert>>Table or Press Ctrl+T+Enter from your keyboard

#Step 2Go to Data>>From Table Range

#Step 3 – Click on Replace Values and replace all special characters with one unique character (including whitespace also) One by One

E.g, Here all the special characters have been replaced with a comma (,)  including white space

Replace text using Power Query

After replacing all the special characters and white space the data will look like this:

Replace text using Power Query 2

#Step 4 – Click on Add Column>> Custom Column  and paste the below formula then click ok

=Text.Replace((Text.Combine(List.Distinct(List.Transform(Text.Split([Data],","),Text.Trim)),",")),",,",",")

Note:- In this example Column name is “Data”, change the column name in the formula per your dataset

Text Split and Text Combine Formula Power Query

#Step 5 – Click on Split Column>>Split Column by Delimiter >>ok

In this example delimiter is Comma(,) 

Split column Power Query

#Step 6 –  Remove data column – right click on data column and remove

#Step 7 – Go to File>>Close and Load

Now the cleaned data will look like this:

Cleaned data power query

How to Extract only Numeric Values from a Text String in Excel using Power Query 

Let’s assume you are working on some data and it has a particular column which contains numeric values but values are not aligned in order, and you want to extract only numeric values.

Sample Data

Extract only numeric values using power query 1

#Step 1 – Convert Data to a Table – Go to Insert>>Table or Press Ctrl+T+Enter from your keyboard

#Step 2Go to Data>>From Table Range

#Step 3 – Click on Add Column>> Custom Column  and paste the below formula in the Custom Column window then click ok

Number.FromText(Text.Combine(List.RemoveNulls(List.Transform(Text.ToList([Data]),each if Value.Is(Value.FromText(_), type number) then _ else ""))))
Extract only numeric values using power query 2

#Step 4 – Go to File>>Close and Load

Now the cleaned data will look like this:

Extract only numeric values using power query 3

#Tip- In case if you do not want to use Power Query but still want to extract only numeric values from the text string in a normal excel sheet then you can use the below formula:

How to Extract only Numeric Values from a Text String in Excel 

Assuming the data is starting from cell “A1”, change the cell reference if your data is not starting from “A1”

=SUMPRODUCT(MID(0&A1,LARGE(INDEX(ISNUMBER(--MID(A1,ROW($1:$300),1))*ROW($1:$300),0),ROW($1:$300))+1,1)*10^ROW($1:$300)/10)

How to convert multiple text values to a date format using a single formula

Assuming there is an uncleaned Date format in Column A, some of the cells values are entered as Text but there is no synchronicity and dates are mentioned in various formats.

Convert multiple text values to a date format

You have a large dataset but you don’t want to apply multiple formulas to clean the Date format neither you want to play with Text to Columns, then what is the solution?

Here is the solution:

Apply this formula in cell B2 then copy the formula until the last row and problem solved!

=IF(ISTEXT(A2),DATEVALUE(SUBSTITUTE(TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,","," "),"."," "),"  "," "))," ","-")),A2)

#Tip – extend the SUBSTITUTE formula if still there are multiple special characters.  

Read More:

How to extract text from a bracket cell in excel

How to unmerge specific cells in Excel using VBA

COMMENTS (1)

  • comment-avatar

    […] Excel Tips 7 convert text to date format 2 Older Post How to clean up scraped data in Excel without VBA […]