Excel: Broken y-axis charting

Breaking the y-axis on a chart can be a useful way to display data, but the technique should be used with discretion, says Excel MVP Liam Bastick.By Liam Bastick, FCMA, CGMA

November 2, 2022

Please note: This item is from our archives and was published in 2022. It is provided for historical reference. The content may be out of date and links may no longer function.

IMAGE BY LAN ZHANG/GETTY IMAGES

One of the things that regularly frustrates me is when people say, “Oh, you can’t do that in Excel.” You pretty much can do anything in Excel; the issue is more, should you use Excel to do that?

The topic we are considering here is a common problem for which the recent global environment provides good examples. The COVID-19 pandemic led to some unusual shopping habits. Indeed, let’s consider the pandemic customer sales data as shown below:

It’s an interesting shop we run. Whilst the concept of seasonality and cyclicality in financial modelling is not a new one, the recognition and analysis of pandemic panic buying is one many of us can identify with just a little too well.

If I use a Clustered Column chart to represent this data, the problem becomes apparent — see the following screenshot:

The majority of the data is swamped by the large values for April and May. If you can quantify the values for July and August from this chart alone, I suggest you do not need to see an optometrist any time soon.

One way to display this data is by using a “broken y-axis” chart, where not all the values on the y-axis are shown — see the screenshot below. (Please also refer to the downloadable Excel file for a modelled example).

This is not a standard Excel chart, and I need to work on my data to get it into this format.

One of the areas I wish to look at is identifying the gap between the large numbers and the remainder. Using Office 365 and dynamic arrays, I can calculate this as follows:

The calculation for Max Gap (referring to the earlier pandemic customer sales data screenshot) is:

=MAX(IFERROR(LARGE(G13:I24,SEQUENCE(COUNT(G13:I24))-1)-LARGE(G13:I24,SEQUENCE(COUNT(G13:I24))),))

This can be broken down as follows:

COUNT(G13:I24) counts the number of numerical values in my data block. There are three products, with 12 months of data for each, so there are 36 values in total, ie, COUNT(G13:I24) equals 36.

SEQUENCE(COUNT(G13:I24)) then provides a columnar sequence of the numbers one [1] to 36:

LARGE(G13:I24,SEQUENCE(COUNT(G13:I24))) derives the values in the data table, in order, from the largest to the smallest (the top values are shown):

LARGE(G13:I24,SEQUENCE(COUNT(G13:I24))-1) derives a similar dataset, albeit displaced by one row, with the first value resulting in a #NUM! error, as shown below:

IFERROR(LARGE(G13:I24,SEQUENCE(COUNT(G13:I24))-1)-LARGE(G13:I24,SEQUENCE(COUNT(G13:I24))),) then subtracts the second list from the first list, on a row-by-row basis, treating the first row subtraction as zero [0] by using the IFERROR function — as shown here:

Wrapping all this up in a MAX function:MAX(IFERROR(LARGE(G13:I24,SEQUENCE(COUNT(G13:I24))-1)-LARGE(G13:I24,SEQUENCE(COUNT(G13:I24))),))generates the largest value, ie, 381,952, as shown in the following screenshot:

The speckled blue cell in this example (in cell G63) contains the formula to calculate the point in the data where the greatest gap occurs:

=MATCH(H63,IFERROR(LARGE(G13:I24,SEQUENCE(COUNT(G13:I24))-1)-LARGE(G13:I24,SEQUENCE(COUNT(G13:I24))),),0)

(where H63 is Max Gap). I won’t go through the entire construction of this formula, suffice to say it locates the row that the largest difference occurs on (in our example, row 7 in the following screenshot):

Further, Lower Val (cell I63) is calculated from this cell, using the formula:

=LARGE($G$13:$I$24,$G$63)

and Upper Val is similarly computed (in cell J63):

=LARGE($G$13:$I$24,$G$63-1)

These correspond to the two other values on this row, which generate the said Max Gap.

To be clear, I know I have used SEQUENCE and spilled arrays here — two elements of dynamic arrays in Office 365 — but this should not put you off this approach. There are alternative ways to derive these numbers, or you could even simply hard-code them. Please don’t let a lack of access to dynamic arrays detract you from this approach. Where there’s a will, there’s a way.

Now, I need to decide how I want my chart to appear. Yes, I am assuming a Clustered Column chart, but what I wish to consider is the scale on the vertical (y) axis. The gaps for the large (April/May) values and the remaining data will be measured by tick marks representing “half multiples” of this number. (This will become clearer later on.)

In this case, I choose a value of 40,000, which is input. I considered writing a formula for this, but this is likely to be a preference thing for many, so I have left this as a hard-coded value. Based upon this number, I revise the gaps to multiples of this number — see the following screenshot:

Note that Rev Upper is one band lower (400,000 not 440,000). This, too, will be clearer later when I create the perceived “break” in the y-axis.

Now, I need to adjust the original data to allow for a gap to appear (see the screenshot below).

The formula in cell G78 is given by:

=IF(G13<$I$68,G13,G13-$H$68)

This essentially reduces the gap between the larger numbers and the rest, affecting the values across the board in April and May (highlighted) in this instance. These have been reduced by 320,000, ie, Rev Gap.

I can plot the revised data on a chart (see the screenshot below) but the axes are not yet correct.

This needs modifying; I need to insert the “break” in the y-axis. To do this, I create a table that maps the existing labels on the y-axis to the new labels I want to use on my broken y-axis — see the screenshot below.

The purpose of these columns will become clear shortly:

The Dummy column is required and contains zero [0] values.
The Rev Val field contains the revised values for the y-axis. In cell G128, the formula is given by:

=IFERROR(G127+($G$65/2),)+($G$65/2)*(N(G127)=$I$68)

This formula essentially adds increments of “half multiples” of cell G65 (the multiple input) — see the earlier screenshot showing the hard-coded multiple — but skips a value where the break should be.

The Chart Lbl field denotes the labels to actually be used for the y-axis. The formula in cell H128 is given by:

=IF($G128<=$I$68,$G128,$G128+$H$68)

This formula essentially restates the values back to what they actually are.

To start my modifications, I copy the range containing Dummy and Rev Val values (not the headings, just cells F128:G137). Then, I select the chart and in the Paste dropdown on the Home tab of the Ribbon. I choose Paste Special (not Paste). I then paste the values into the New series, Values (Y) in, Columns, Series Names in First Row, and Categories (X Labels) in First Column (see the screenshot below).

This generates a rather interesting chart — see the following screenshot:

This isn’t quite what we want, is it? This creates a new series, which I change to a scatter chart by right-clicking on it and selecting Change Series Chart Type… from the resulting shortcut menu. Here, I select the Scatter chart using the secondary axis, with markers only.

At this point, I make a couple of refinements:

Right-clicking on the resulting secondary (top) x-axis, I format the numbers between zero [0] and one [1] to one [1] decimal place.
Right-clicking on the resulting secondary (right) y-axis, I change the Axis Options of the Axis Options (yes, that’s right, confusing I know) to reset the Maximum to 200,000 — see the following screenshot:

This results in another interim chart (see below) where you might now start to realise what is beginning to take shape.

At this point, I now delete the secondary (vertical) y-axis; it has served its purpose with the rescaling. I also “freeze” the secondary (horizontal) x-axis values between zero [0] and one [1], in order for the chart not to change bizarrely should values be updated. Again, this is performed by right-clicking on the secondary (top) x-axis and changing the Axis Options of the Axis Options (see the screenshot below).

This results in the similar looking work-in-progress chart — see the following screenshot:

Next, I format the remaining y-axis and the secondary x-axis, removing all values, tick marks, labels, and other hangers-on — as shown below:

Now I click on the chart legend at the bottom, and once selected, click once more, but this time on Rev Val. I simply hit Delete on my keyboard, and the series label disappears — but crucially, not the series itself.

I then right-click on the Rev Val scatter series itself situated on the y-axis and select Add Data Labels… -> Add Data Labels. Right-clicking on the emanating data labels, I select Format Data Labels….

In the resulting Format Data Labels pane, from Label Options in Label Options, I ensure Label Contains only selects Value From Cells (select this before deselecting other options, otherwise the labels will disappear and you will have to start again). Then, I choose the range H128:H137, ie, the Chart Lbl labels created in my helper table from earlier (see the screenshot below).

This creates a further chart — see below:

In truth, I could almost leave it at this juncture, but I’m a bit of perfectionist. I want to create the deceit of a gap between 80,000 and 440,000 on the y-axis. To do this, it’s time for another helper table — see the following screenshot:

This table will help me create the illusion of a gap. To explain the four columns:

Values creates equal gaps between zero [0] and one [1] for the 12 rows of data points. The formula in cell F293 is given by:

=F292+1/11

which demonstrates how the values are divided equally.

Data 1 provides the lower bound of the break. The formula in cell G292 is given by

=$I$68+$G$65/4

Similarly, Data 2 provides the upper bound of the break. The formula in cell H292 is given by:

=G292+$G$65/2

Finally, Data 3 provides the midpoint between Data 1 and Data 2. For example, the formula in cell I292 is given by:

=AVERAGE(G292:H292)

And there’s more. I also have created and saved the following white rectangle shape as the marker for a line series (the yellow background is merely used to emphasise the shape):

Fascinating, I know, but useful, nonetheless. I ensure it is white with no border, and then I save it as a picture. The dimensions may need to be played with, but all will become clear shortly.

Now, I add the three series, Data 1, Data 2, and Data 3, into the chart in the usual way as Scatter chart elements, using the values in columns G, H, and I, respectively, for the y–axis values and column F for the secondary x-axis values. Data 1 and Data 2 data series have been formatted (under the Fill & Line section of the Format Data Series pane) with no marker, but a red dotted line (see the screenshot below).

However, with Data 3, I became a little more inventive. In the Marker subsection of the Fill & Line section of the Format Data Series pane, having chosen no line, I elected to insert a picture (see the screenshot below).

From the resulting — rather large — Insert Pictures dialog box, I selected From a File and chose the white rectangle picture, saved earlier (see the screenshot below).

If you strike lucky and have the dimensions of your rectangle and the image size both correct, you should obtain the final desired outcome — as shown in the screenshot below.

Word to the wise

This is at best a semiautomated process, as you need to consider:

The multiple selected.
Ensuring the secondary y-axis maximum value scales appropriately (ie, I needed to “freeze” the maximum value).
Potentially “tweaking” the white rectangle used for blanking out the break.

And, yes, I have used dynamic arrays/Office 365 for part of my solution, but that is not vital to the procedure. Indeed, VBA macros could be used to automate more of this process, but this is only one area of concern.

I mentioned in my introduction that this area divides opinion. Many highlight the need for manual manipulation. But this is not the only reason: The image generated is, quite frankly, deceptive, albeit necessarily so. It makes it easier to read the smaller values but understates the larger values should end users only glance at the chart.

Many believe it is better to create two charts — one for the smaller values and one for the larger ones. The sense of scale is lost, but, quite frankly, it shouldn’t be relied upon in this chart in any case.

Others suggest creating a “Panel” chart, whereby effectively one chart is superimposed above another to create an alternative view.

In summary, the point of this article was to show you how to create a broken y-axis chart, not that you should. I leave that question amongst you, your audience, and most importantly, your boss.

— Liam Bastick, FCMA, CGMA, FCA, is director of SumProduct, a global consultancy specialising in Excel training. He is also an Excel MVP (as appointed by Microsoft) and author of Introduction to Financial Modelling and Continuing Financial Modelling. Send ideas for future Excel-related articles to him at liam.bastick@sumproduct.com. To comment on this article or to suggest an idea for another article, contact Oliver Rowe at Oliver.Rowe@aicpa-cima.com.

Excel: Broken y-axis charting

Word to the wise

Up Next

CIMA urges government to restore UK business confidence

Our Websites

Word to the wise

Up Next

Most Read

LATEST STORIES

Related Articles