Advertisement

Transform data with Excel Power Query

This tool allows you to eliminate tedious repetition when inputting CSV files.

Excel users often need to take and transform data from comma-separated-value (CSV) files for analysis. Fortunately, Excel has a tool designed to assist with this process — and it’s easy to use.

Power Query was first introduced as a downloadable COM add-in with Excel 2010 and Excel 2013. It went native in Excel 2016 with its own grouping on the Data tab of the Excel Ribbon, as shown in the first screenshot below.

You may notice that the group is named Get & Transform Data, which was how Microsoft rebranded the tool in 2016, as shown in the first screenshot below. However, when you input data, it still loads up the Power Query Editor, as shown in the second screenshot below.

get-transform

power-query-editor


Whatever you prefer to call it (we’ll call it Power Query in this article), this software allows you to connect to data from a variety of sources (eg, Excel files, internet data, SAP business warehouses, Access files, and other databases), then manipulate it, say, by removing unnecessary data, additional spaces, and unpivoting elements, and finally load the tailored version of the combined files in such a way they may be used by Excel (eg, Power Pivot) for further analysis. It’s not the “sexy end” of the business intelligence suite of tools, but it’s absolutely essential.

Power Query provides plenty of useful features for accountants, but this article focuses on how easy it is to transform CSV data.

Getting started

For this example, we are going to use the From Text/CSV option (in the Get & Transform Data grouping of the Data tab on the Excel Ribbon) and browse to the location of a simple expense CSV file. To follow along with this example, download this Excel file with the sample data saved in CSV format. If you are using Excel with US date format, you will need to open the .CSV file and convert the dates using this technique (or you can just correct them by hand for this example). Once you are ready, close the .CSV file (save as .CSV, not as an Excel format), create a new Excel workbook, and click on Data > From Text/CSV.

from-text-csv


This opens an intermediate dialog box, which has been added to give the user the option to decide whether Power Query should make some assumptions about data types, as shown in the screenshot below.

derek-expenses


Next take the default option to allow Power Query to detect data types Based on first 200 rows (more than enough for our short example). We could just choose to Load from this point (it always seems to be the default option!), but let’s choose to Transform Data instead (there’s a great tip with Power Query: when extracting (getting) data, always click on the button immediately to the left of Cancel!).

At this point, the Power Query Editor screen appears and shows the table of data as it will be uploaded and the steps that Power Query has taken automatically.

editor


The source has been identified and brought in. The syntax for this (displayed in the formula bar) is:

= Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}}). If the formula bar does not appear for you, don’t worry: Simply go to the View tab on the Ribbon and activate the Formula Bar checkbox in the Layout section.

This is a simple example of what is known as M code, which is the underlying programming language of Power Query (the calculation engine is known as the Mashup Engine). It is essentially a list of column names that shows in this instance that all the fields have been assumed to be text.

Try typing over the text: You will see that you can’t. The data has been imported as “read only”, and nothing you do will in any way affect the source data. That’s good: That will mean your friendly IT administrator is more likely to assist you with your Power Query tasks.

Also, notice the Power Query Editor has tabs with different transformations and a shortcut menu, accessible via right-clicking the mouse. In moments, I can promote the first row to the headers by selecting the option on the Home tab of the Ribbon:

headers


Do you see how it is recording the steps in the APPLIED STEPS section of the Query Settings pane on the left? Moreover, it actually has changed data types a second time now that the text in what was the top row has been promoted (this is why it says Changed Type1).

However, it hasn’t guessed right (we don’t want all the fields to be text), so let’s remove this step by simply clicking on the red X, as shown in the screenshot below.

applied-steps


Simply click on the red X, and the last step is deleted (be careful; this cannot be undone). The step may also be moved or renamed/annotated by right-clicking, too.

Now let’s assign the data types manually on the Home tab, as follows:

  • For Name, keep it as Data Type: Text.
  • For Date, select Data Type > Date, as shown in the screenshot below.
  • For expense code, keep it as Data Type: Text.
  • For amount, select Data Type > Currency.
date-column


I have deliberately made the currency British pounds sterling (£), as this character is not recognised for many regional settings and may therefore cause an error if you try to change the Data Type to currency. If you encounter this issue, you can right-click on the field and select Replace Values…, as shown in the first screenshot below (note that the Data Type will need to be Text for this search to work). Then you replace the character £ with nothing (blank), as shown in the second screenshot.

replace-values

replace-values-2


Another way to deal with the currency issue would be to split the field on the first character. Go to Split Column > By Number of Characters, as shown in the screenshot below. In the resulting dialog box, type 1 into the field for Number of characters and then select the first radio button for Once, all the way to the left.

amount-column


That’s the beauty of Power Query: There’s more than one way to transform your data.

The final step is to make the Name data appear on each row. There is an option to Fill Down, but in order to use this feature, the Name cells that are to be populated must be set to null first (they are currently blank, which is not the same thing). Therefore, you need to replace the blanks with null first, as indicated in the screenshot below.

replace-values-3


Make sure you type null precisely. The M language is case-sensitive and requires precision (eg, no additional spaces). Once you have replaced the blank cells with null, you can then Fill Down, using the shortcut menu once more, as shown in the screenshot below.

fill


Your data has been transformed, as shown in the screenshot below.

name-column


You can Close & Load this data back into Excel now, all cleaned up.

Power Query has recorded all of your steps. If the source data were to be updated, all you would need to do is click Refresh All on the Data tab and your transformed report would be updated in a heartbeat.

refresh-all


We will cover Power Query further in future columns. In the meantime, you can use this tool to say goodbye to tedious repetition when importing and transforming data.

— Liam Bastick, FCMA, CGMA, FCA, is director of SumProduct, a global consultancy specialising in Excel training. He is also an Excel MVP (as appointed by Microsoft) and author of Introduction to Financial Modelling. Send ideas for future Excel-related articles to him at liam.bastick@sumproduct.com. To comment on this article or to suggest an idea for another article, contact Jeff Drew, an FM magazine senior editor, at Jeff.Drew@aicpa-cima.com.