Accountants often need to look up data in a table or list, and two popular functions for doing so are VLOOKUP and HLOOKUP. But do you realise it's very easy to make a mistake with these functions?
Let's start with a refresher.
VLOOKUP(lookup_value,table_array,column_index_number,[range_lookup]) has the following syntax:
- lookup_value: What value do you want to look up?
- table_array: Where is the lookup table?
- column_index_number: Which column has the value you want returned?
- [range_lookup]: Do you want an exact or an approximate match? This is optional, and to begin with, I am going to ignore that this argument exists.
HLOOKUP is similar, but works on a row, rather than a column, basis.
I am going to use VLOOKUP throughout to keep things simple. VLOOKUP always looks for the lookup_value in the first column of a table (the table_array) and then returns a corresponding value so many columns to the right, as determined by the column_index_number.
In the screenshot "VLOOKUP Illustration", the formula in cell G25 seeks the value 2 in the first column of the table F13:M18 and returns the corresponding value from the eighth column of the table (returning 47). You can follow all of these examples in this Excel file.
Seems pretty easy to understand so far, right? So what goes wrong? Well, what happens if you add or remove a column from the table range?
Adding (inserting) a column gives us the wrong value, as shown in the screenshot "VLOOKUP: Column Inserted".
VLOOKUP: Column inserted
With a column inserted, the formula contains hard code (8) and therefore, the eighth column (M) is still referenced, giving rise to the wrong value. Deleting a column instead is even worse, as shown in the screenshot "VLOOKUP: Column Removed".
VLOOKUP: Column removed
Now there are only seven columns, so the formula returns #REF!. Oops.
Fortunately, it is possible to make the column index number dynamic using the COLUMNS function, as shown in the screenshot "VLOOKUP With COLUMNS Function".
VLOOKUP with COLUMNS function
COLUMNS(reference) counts the number of columns in the reference. Using the range F13:M13, this formula will now keep track of how many columns there are between the lookup column (F) and the result column (M). This will prevent the problems illustrated above.
But there are more issues. Consider duplicate values in the lookup column. With one duplicate, the result is as shown in the screenshot "VLOOKUP With 1 Duplicate".
VLOOKUP with 1 duplicate
Here, the second value is returned, which might not be what is wanted. With two duplicates, the result is as shown in the screenshot "VLOOKUP With 2 Duplicates".
VLOOKUP with 2 duplicates
Ah, it looks like it might take the last occurrence. Testing this hypothesis with three duplicates, as shown in the screenshot "VLOOKUP With 3 Duplicates".
VLOOKUP with 3 duplicates
Yes, there seems to be a pattern: VLOOKUP takes the last occurrence. Better make sure, as shown in the screenshot "VLOOKUP With 4 Duplicates". Rats. In this example, the value returned is the fourth of five. The problem is there's no consistent logic and the formula and its result cannot be relied upon.
VLOOKUP with 4 duplicates
It gets worse if we exclude duplicates but mix up the lookup column a little, as shown in the screenshot "VLOOKUP Out of Order".
VLOOKUP out of order
In this instance, VLOOKUP cannot even find the value 2!
So what's going on? The problem — and common modelling mistake — is that the fourth argument has been ignored:
The argument [range_lookup] appears in square brackets, which means it is optional. It has two values:
TRUE: This is the default setting if the argument is not specified. Here, VLOOKUP will seek an approximate match, looking for the largest value less than or equal to the value sought. There is a price to be paid, though: The values in the first column (or row for HLOOKUP) must be in strict ascending order — this means that each value must be larger than the value before, so no duplicates.
This is useful when looking up postage rates, for example, where prices are given for bands of mass and you have, say, 2.7 kilogrammes to post. It's worth noting, though, that this isn't the most common lookup when modelling.
FALSE: This must be specified. In this case, data can be any which way — including duplicates — and the result will be based upon the first occurrence of the value sought. If an exact match cannot be found, VLOOKUP will return the value #N/A.
And this is the problem highlighted by the above examples. The final argument was never specified, so the lookup column data must be in strict ascending order — and this premise was continually breached.
The robust formula needs both COLUMNS and a fourth argument of FALSE to work as expected, as shown in the screenshot "VLOOKUP With COLUMNS FALSE".
VLOOKUP with COLUMNS FALSE
This is a very common mistake in modelling. Using a fourth argument of FALSE, VLOOKUP will return the corresponding result for the first occurrence of the lookup_value, regardless of the number of duplicates, errors, or series order. If an approximate match is required, the data must be in strict ascending order.
VLOOKUP (and consequently, HLOOKUP) are not the simple, easy-to-use functions people think they are. In fact, they can never be used to return data for columns to the left (VLOOKUP) or rows above (HLOOKUP). So what should modellers use instead? Look for my article in December's issue of FM magazine to find out.
Word to the wise
As stated above, HLOOKUP works like VLOOKUP but hunts out a value in the first row of a table and returns a value so many rows below this reference. However, it has the same limitations and should be used just as carefully.
Liam Bastick, FCMA, CGMA, FCA, is director of SumProduct, a global consultancy specialising in Excel training. To comment on this article or to suggest an idea for another article, contact Jeff Drew, an FM magazine senior editor, at Jeff.Drew@aicpa-cima.com.