Make VLOOKUP Blazing fast!
Make your VLOOKUP run a hundred times
faster with the double-VLOOKUP trick
If you are a frequent user of
Excel, you most certainly know the VLOOKUP function. If not you can read
this tutorial to learn this must-know function.
But if you have ever worked on
large Excel files with hundreds of thousands of rows, you certainly experienced
how painful the VLOOKUP function can be. Although very useful to link data from
multiple tables, the VLOOKUP function quickly becomes unable to calculate in an
acceptable amount of time. On large Excel files, it can take up to an entire
hour to recalculate your spreadsheet, with most of the time application failure
in the middle.
The good news is, you can make
VLOOKUP work much faster with a technique called the Double-VLOOKUP trick.
So how does it work ? Let’s
start with a quick reminder on how the VLOOKUP function works. If you already
know that you can move to the next section.
The regular VLOOKUP function
The VLOOKUP functions writes as
VLOOKUP(lookup_value,table_array,col_index_num,[range_lookup]), where:
· Lookup_value is the ID you are using to link your two tables. For
instance your ID is “1234” if you want to lookup a value for ID
"1234" on another table.
· Table_array is the table in which you are looking. It’s a table
with IDs in the first column.
· Col_index_range is the index of the column which contains the
target value in the table_array.
In the example below, the
VLOOKUP function will return the value for the ID "1234" in the
table, which is 17.
Excel regular VLOOKUP function
Unsorted search Versus binary search
Fine observers will have
noticed that I haven’t explained the [range_lookup] argument yet. FALSE
corresponds to searching unsorted data, while TRUE corresponds to a binary
search. What’s the difference?
· Regular search (FALSE) is super slow, because Excel checks
for each row in the table if it matches your ID, until a match is found.
Statistically, it takes searching half of the rows to find a match. If you have
100k rows in your first table and 200x in the second table, that’s 100k x 200k
x 0.5= 10 billion operations!
· Binary search (TRUE) works only on sorted data, but:
o Because your IDs are sorted,
Excel can work smarter. On the first calculation, it checks the value in the
middle of the table. If it is smaller than your ID, then Excel knows your ID
can only be found above (because the column is sorted), and if it is above the
ID can only be found below. That means in just 1 calculation, Excel has already
ruled out half the column. On the second calculation, it will do the same and
rule out again half of the remaining, etc.
o Consequence is: binary search
is way faster than unsorted search. Way way, way faster.
The only issue is: what happens
if your ID does not exist in the table you are looking for? Your VLOOKUP
function will return the value for the closest match to your ID, but what you
want is usually an exact match. So binary search is faster but you cannot be
sure the result is correct! Damn it!
The Double-VLOOKUP trick
The Double-VLOOKUP trick is
made to use the crazy speed of the binary search while ensuring the result is
correct. This is how we do it:
=IF(VLOOKUP(ID,Table,1,TRUE)=ID,VLOOKUP(ID,Table,2,TRUE),NA())
The first VLOOKUP checks that
the ID you are looking for exists in the table, and only if he exists the
second VLOOKUP will return the corresponding value. Doing so, you avoid getting
wrong values when there is no corresponding ID.
So how much faster is it?
According to the experiment run
by Dailydoseofexcel.com,
the Double-VLOOKUP is 28 times faster than the regular VLOOKUP for a table of
10,000 rows, and this increases up to 3,606 times faster than the regular
VLOOKUP for a 1,000,000 table.
Conclusion
Although the regular VLOOKUP
function is just simpler to use on a day-to-day basis, keep in mind that if your
spreadsheet starts taking time for calculations, you can make it much faster by
using the Double-VLOOKUP trick. Just remember to sort your IDs before and your
spreadsheet will calculate crazy fast again.