Advertisement

Talking 'dirty' about volatile functions in Excel

When working with a large spreadsheet, few things are as frustrating as seeing “CALCULATE” appear in the bottom left-hand corner of your status bar and the calculation percentage status crawl towards 100% (and why does it always start again when it gets to 99%?).

It’s understandable then why Excel modellers would be motivated to do something, anything to fix this problem. But what modellers themselves need to understand is that one of the worst things modellers would be tempted to do to counteract this is to set Workbook Calculation to anything other than “Automatic” (Alt+T+O, Formulas):

excel-automatic


The reason for this is very simple. Although this action allows modellers to continue spreadsheet construction unencumbered, when was the last time you recall ever checking that a model you received was calculating automatically? Everyone just assumes that this is the case and makes managerial decisions accordingly.

This action addresses only the symptom, not the cause.

So what should modelers do? They should avoid making volatile moves that can dirty up their workbooks. What does that mean? I’ll tell you at the end, but first you need to understand what dirty and volatile mean in an Excel context.

Dependency trees

Obviously, size does have some impact upon calculation speed, but perhaps not as much as you might think. The Excel calculation engine is almost as lazy — sorry, I mean “efficient” — as I am.

Rather than recalculate every cell every time any cell is changed, Excel determines which cells are affected by the latest change (known as “dependent cells”) and recalculates these dependents and then the dependents of these dependents and so on. These long chains are known as “dependency trees”. Depending upon which version of Excel you are using, there may be a limit of how many dependency trees Excel can keep track of (it was 65,536 prior to Excel 2007, with the number debated by experts thereafter) before Excel has to resort to a full (ie, slower) calculation.

When cells are changed, Microsoft recognises the cells that need recalculating as a consequence. These cells are known as “dirty” cells. Therefore, calculation time is a function of the number of dirty cells and the number of dependency trees, both affected and unaffected.

As a modeller, you would find it difficult to change the number of dirty cells that genuinely need to be recalculated, short of writing more complex formulas in fewer cells (this is one reason “megaformulas” often calculate more quickly, although this increases the risk of modelling errors, etc, instead).

However, modellers can do something about those functions and functionalities that Excel has potentially mistakenly considered dirty. More often than not, these instances are caused by volatile functions and/or volatile actions.

Volatile functions and actions

A volatile function is one that causes recalculation of the formula in the cell where it resides every time Excel recalculates. This occurs regardless of whether precedent cells/calculations have changed, or whether the formula also contains nonvolatile functions. One test to check whether your workbook is volatile is to close a file after saving and see if Excel prompts you to save it a second time (this is an indicative test only).

Some functions are obviously volatile, eg, NOW(), RAND(), TODAY(), and, perhaps slightly less obviously, CELL(“filename”), which keeps track of whether the file name has changed.

Others are not so obvious. For example:

  • INDIRECT has an argument that is typically constructed out of text, eg, INDIRECT(“$A$1”), where, in this instance, the function will inspect the contents of cell A1. If the content is, say, B2 (not =B2) then INDIRECT(“$A$1”) will return the value in cell B2. This might look like a cell reference, but it is not and needs rebuilding each time; and
  • OFFSET takes numerical arguments, which point to a cell reference but are still just numbers that need to be calculated each time.

Just because a function is volatile in one version of Excel does not mean it is volatile in all versions. Perhaps the best example of this is INDEX, which was volatile prior to Excel 97. Microsoft still states this function is volatile, but this does not appear to be the case except when the function is used as the second part of a range reference. For example, $A$1:INDEX($A$2:A$10,4) will also cause the reference to be flagged as dirty only when the workbook is opened.

Another common “semi-volatile” function is SUMIF, which has been so since Excel 2002. This function becomes volatile whenever the size of the first range argument is not the same as the second (sum_range) argument, eg, SUMIF(A1:A4,1,B1) is volatile whereas SUMIF(A1:A4,1,B1:B4) is not.

Indeed, crowd-pleasers VLOOKUP and HLOOKUP could be argued as “kind of volatile” (that’s not going to please my readership). These functions require a range (table_array) to be specified in the second argument (either VLOOKUP(lookup_value, table_array, …) or HLOOKUP(lookup_value, table_array, …). If any values change in this range — even if not referenced in the first row/column or the specified row/column — the formula has to recalculate. That’s a form of volatility, too.

IF and CHOOSE do not calculate all arguments, but if any of the arguments are volatile — regardless of whether they are used — the formula is deemed to be volatile. Therefore, IF(1>0,1,RAND()) is always volatile, even though the value_if_false argument will never be calculated. It is not quite as simple as this though. If the formula in cell A1 is =NOW(), then this cell will be volatile, but IF(1>0,1,A1) will not be.

In essence, direct references or dependents of volatile functions will always be recalculated, whereas indirect ones will recalculate only when activated or in certain other functions that always calculate all arguments such as AND and OR.

One area that has caught me out on occasion is the use of formulas in conditional formats. These are always volatile so that the formatting is displayed correctly — not just on calculating but when you change worksheet or scroll up, down, left, or right even if the calculation mode is set to “Manual”!

Volatile actions are those that trigger recalculation. Microsoft has compiled a list of such actions:

  • AutoFilter: Filtering data in a range will make any formulas in this range dirty.
  • Double-clicking on row or column dividers: Automatic resizing is a trigger when the model is set to calculate automatically (but not in Manual mode), but bizarrely, manually adjusting row heights or column widths is a nonvolatile action.
  • Goal Seek: Using this scenario-analysis tool results in the model requiring recalculation not just once, but for every iteration (which could be a maximum of 32,767 times).
  • Hiding/unhiding rows: This was in Excel 2003 only (but hiding/unhiding columns is not a volatile action) but is mentioned for completeness. Some think this is because the properties of SUBTOTAL changed in Excel 2003, but this does not explain why this is nonvolatile in subsequent versions of Excel.
  • Inserting, deleting, or moving rows, columns, or cells.
  • Opening a .csv file: Hard to believe, but true, even if the calculation setting is Manual.
  • Range names: Adding, editing, or deleting a range name in any way will trigger a calculation event.
  • Worksheets: Deleting, renaming, or moving a worksheet is a volatile action. Interestingly, adding a worksheet does not trigger recalculation (this may lead to summation inaccuracies though if formulas sum through sheets).

Word to the wise

In order to speed up Excel workbooks, modellers should plan to keep the number of volatile functions and range names to a manageable minimum. Where possible, consider using INDEX rather than OFFSET, specifying SUMIF correctly, etc.

Designing models efficiently so that formulas are copied as frequently as possible will reduce the number of dependency trees, again shortening the calculation time.

Furthermore, reducing the number of volatile actions will also reduce the number of trigger points. In particular, consider how much you really need to have conditional formatting in your models.

As always, efficient model construction is a delicate balancing act: transparency and versatility may sometimes need to be tempered by the need to have the model calculate before the sun goes supernova. There is no hard-and-fast rule: It is a judgment call that gets better with (bitter?) experience.

— Liam Bastick, FCMA, CGMA, FCA, is director of SumProduct, a global consultancy specialising in Excel training. He is also an Excel MVP (as appointed by Microsoft) and author of An Introduction to Financial Modelling. Send ideas for future Excel-related articles to him at liam.bastick@sumproduct.com. To comment on this article or to suggest an idea for another article, contact Jeff Drew, an FM magazine senior editor, at Jeff.Drew@aicpa-cima.com.