An example of the ‘big data’ we deserve
I typically write about our business applications: creating value from automated decisions.
Today instead I will focus on an often unloved, yet crucial, part of the chain: using Excel to extract last-mile insights from the mountain of data we process.
While everyone can do it, too many still try and fail mastering this skill.
My view is that honing crucial Excel charting skills means unleashing the most powerful and cheapest workhorse endowed to the modern manager!
Our products, among many other things, also produce automated Excel charts for our clients; our product development team manually reviews output with a number of tools, including Excel. Therefore I have grown a professional interest in making this topic accessible.
We all can discover the visual secrets of ‘big data’ even just using Excel, when doing things right; this post details my own approach and its rationale.
Why charting in Excel?
Yes, in 2020 there are many fancy, advanced graphical tools out there… and I know it: our scientists use them, for their specific purposes and product use-cases.
But after all the core analytical hard work will have been completed with other powerful tools, ‘big data’ must become small to make it understandable.
Charting is the very last mile of analytics, with a simple rule: clarity of message is the only mantra, regardless of tool choice.
And when it comes to ‘charts for the rest of us’, for people like you and me, nothing beats the ease of use and sheer power of good ol’ Microsoft Excel:
- To render even millions of data points: just let Excel work its magic, the performance of its rendering engine is amazing
- To add charts: just create new tabs
- To set up a development environment stack: just copy the file!
Excel charting is fast, flexible and cheap. But it takes five minutes to start, and a lifetime to master it. Especially to achieve fully automated results, rather than clunky manual Excel work.
Default settings are core to the experience of the average user.
But Bill Gates must have wanted to pull nasty pranks and/or inflict painful experiences on the global business community: Excel charts are initially set with the wrong default choices. Many of them, even.
Some examples, just to name a few?
- The default palette is more carnival than business
- Simple actions like inserting rows or moving data can de-face any chart
- Negative values break multiple chart types
- ‘3D’, ‘waterfall’ and other types are dysfunctional
- Data labels are hard to use and overlap each other
(note: each of these defaults, and many more, can be addressed & fixed with some love)
I entirely avoid those apparently fancy options like ‘chart sheets’ or ‘pivot charts’, which are in fact less flexible and therefore not worth the effort. My exclusive focus is on standard charts inside normal worksheets.
The good news
It is possible to make Excel charts work, right now, and relatively fast too!
After reading this post, armed with brand-new settings, anyone can achieve brilliant automated results with limited effort:
- Excel only: no external plug-ins that slow down Excel, cost extra and may stop working
- Zero code: no VBA or other hacks that risk crashing, are hard to share and require skill to maintain
- Automated: no manual steps, but charts linked to data automatically, therefore without additional ad-hoc work to be repeated every time the data changes.
The content that follows is best seen using a desktop monitor or larger tablet.
My key guiding principles, developed over years of working with ‘big data’:
- Data first: free up the data itself to do all the talking, subtly hiding unnecessary distractions e.g. axes, gridlines, legend, labels, extra 0000
- Logical sorting: use descending or ascending order, because alphabetical (?) or even random sequencing can hide the message in the data
- Eyeball control: keep viewers’ eyes on the data, with logical positioning of supporting elements and references that prevent glancing back-and-forth.
Visual data only makes sense as a comparison with relative reference to something else. Is it a lot? Or not enough? Is it growing fast? Is a set relatively bigger than another?
The same holds true of (any) pricing in general, as we can instinctively only evalutate a price in comparison to something else – ours brains desperately do that in the background – so do we with any data shown on a chart.
Which is why I like to design my charts with clear visual references to underpin the so-what.
Armed with these principles, mastering a handful of chart types is all it takes to truly own Excel charting. Even in the age of ‘big data’ the 5 kings of all charts are:
- Line: time series
- Pie & stacked: components
- Column & bar: frequency & item values
- Scatter & bubble: bi- & tri-dimensional correlations
- Boxplot: distribution.
Using, and where appropriate combining, these core chart types can deliver nearly any message a modern business manager may ever need.
No matter how large the underlying data set to any chart may grow: the viewer’s brain remains the same.
Therefore, charts must be designed to remain small and simple: I’d rather illustrate a complex message with a simple chart, than throw big data on a big chart.
Excel default settings cannot be trusted.
But a library of ‘model charts’ can become a ‘template’, thus avoiding Excel’s default charts like the Plague.
How to start?
- BUG ALERT! Excel Options>Advanced> Chart>Properties: why on Earth would ANYONE want to ‘follow chart data point’, and the rationale for making these options checked by default, is beyond me; unchecking both options prevents losing precious formatting when the position of underlying data changes
- New Chart/Chart Options/Properties: I don’t like charts’ shape to change when inserting/deleting rows; rather than ‘Move and size with cells’ (default) I prefer ‘Move but don’t size with cells’
- Fill & Border: I fill the chart area in white, 0% transparency, with thin 0.25pt border of ‘evo green’ or dark grey; for data sets, still I use a thin 0.25pt border but with 50% transparency so it becomes even more subtle, while still guiding the eye
- Shadow: So long as not overdone, bottom-right shadows ‘pop out’ charts, using the pre-set (‘outer’, first choice top left) with ‘distance’ reduced to 2pt; I use shadows very sparingly, mainly for the overall chart area or even nowhere at all, to avoid creating confusion and unnecessary clutter
- Font type: I use ‘Lato’ and default size 10 for my target look&feel; all text ‘lighter black’ (RGB 69/69/69) to prevent ‘black-flashing’; chart title 12pt with heavy/bold type
- Line thickness: all non-data lines & tick marks are 0.25pt thick, which is the thinnest setting available; gridlines do not add precision but add clutter, therefore I avoid them in favor of data labels, and only when wanting to show precise data values; now, since managers are smart people who hate being ‘shouted at’, I also keep tick marks muted
- Color palette: a matter of preference and corporate policy; I may use ‘evo green’ for emphasis but generally find ‘shades of blue’ to be the most business-elegant, compelling and universally simple to read solution – after all, both the sky and the sea are blue anywhere in the world, and the deeper the darker – intuitively makes sense… and also works for those 7% of men with visual impairment (albeit rare among women, luckily only <1% of them are color-blind), and even when printed in black & white
- Spacing: wherever possible, for text & totals specifically, I remove any and all the extra default padding, 0 distance from axis instead of 100, and 0 borders instead of the default 0.11/0.05
- Labels: I format labels and apply them as their own separate data set ‘value from cells’ rather than accept auto-labels from Excel, so that in the background I become able to hide the overlaps and freely define the correct number/text type to show
With this initial set-up, and using ‘named ranges’ together with the ‘offset()’ formula, any chart type can be linked in a flexible way to any data set with fully automated adjustment and clarity of formatting – so, finally, now onto more specific details for each chart type.
Trends over time. Specific settings:
- Series labels on the left, their logical place, using an auxiliary scatter data set with invisible dots and automated ‘leader lines’ (combo chart) to tell the labels apart when needed; if a single series, its name would be in the chart title instead
- Unit of measure in the subtitle as there is a single axis, alternatively it would go right above each axis
- Thousands/ millions/ billions used to prevent 000 pollution – automated using Excel’s custom formats; title and subtitle are automated using the direct link to a cell value
- Y axis on the right, with optional gap to underline it is a reference scale and not a vertical line whose position has any other meaning
- Axes labels ‘bottom’ below negative data, alternating to reduce crowding, with tick marks ‘inside’ to point towards the data, more important than the labels’ text.
An optional legend on the top right would be redundant to the series labels on the left-hand-side, which are easier to read, but the lines may overlap thus leaving potentially residual ambiguity; therefore a matter of preference whether to add a legend or not.
No secondary axis please. I prefer having charts above each other, than adding multiple units of measure; with the added benefit that X axis labels can often be omitted in the ‘above charts’.
Excel renders each series progressively, one after the other: the first series is placed behind every other series; this typically should be the X axis, which is itself a line chart because scatter lines are always rendered above any normal line series, and this cannot be changed.
The ‘tick marks’ are fully controllable ‘error bars’.
The most important series, in the example above ‘This year’, is rendered last ‘above everything else’ when two series cross each other: to achieve this I moved it to be the last data set.
Additional dummy data sets are often needed, and they are fast to implement. This is what my list may look like, for example:
These dummy data sets may show, for example, the X axis, or legend keys in the right order with the most important data series first, rather than reversed, which is exactly as logic would suggest.
Dummy data sets may also show the automated ‘vertical line’, in my example case to highlight the specific date when a price change was applied by the company.
Pie & stacked charts
These all show components of a whole. Specific settings:
- Pie chart labels inside and outside require two overlapping charts, one using the secondary axis – whatever this actually means to Excel for a pie chart (!)
- Rotation and explosion in the top chart are centered at 90 degrees, as the eye detects the prominence of this slice more easily thanks to its horizontal orientation; only part-automated, requires a User Defined Function or VBA for full automation
- The stacked % chart, often cleaner than multiple pie charts, is a normal column or bar chart with values auto-scaled to 100% before plotting; thin connector lines help highlight the changes, ideally with focus on the ‘base data set’, the one at the bottom.
To show the content of labels spanning across multiple lines, I use ‘char(10)’ as the ‘new line’ character, together with the ‘text()’ function to build the label string.
Column & bar charts
Frequency & item values. Specific settings:
- Same as other charts: pre-sorting, series labels, auto-leader lines & no overlaps
- Gap width 80% instead of default 150%
- Supporting/invisible data set for the waterfall chart
- Supporting lines for the ‘growth’ lines label losenges
- Data label shape for losenges e.g. to highlight CAGR average growth rates.
Totals shown above or right, for positive bars, and below or left for negative bars. 50% transparent white fill for negative totals to smooth out axis overlaps.
Other examples within this family of charts:
In the waterfall, series labels are right-hand next to the ‘totals’ rather than left-hand, and shown only where needed, e.g. more than 1 series. ‘Triangles’ are shown with ‘shape fill’ – anything can be used, even images of trees! Mekko is a ‘stacked area’ chart with date axis. Clustered, which I do not like to use as it tends to hide its message, has series overlap set to 0%.
Scatter & bubble charts
Bi- & tri-dimensional correlations. Specific settings:
- Axes labels ‘low’, outside the chart, with tick marks ‘inside’ and shadow to the bottom left
- Axis titles and units next to each axis for maximum ease of reading
- Bubbles with thin borders and 50% transparent filling, to highlight overlaps between different data points, & data labels with 50% transparent white background
- Data labels hidden when overlapping: this is a chart, not a data table!
- Size represents width of bubbles, at this choice accentuates the relative differences and is therefore more intuitive to the human eye than area-sizing.
With regards to preventing data labels overlap, I calculated the X/Y coordinates of each corner and then checked if one rectangle is either above the top edge or on left side of left edge of the other rectangle.
Other related chart types in this family:
For the ‘Ladder’ chart: thin gridlines 0.25pt, with light grey 0% transparency.
Distribution, for example of price ranges. Specific settings:
- Column chart uses an ‘invisible’ supporting bar at the bottom
- Error whiskers above and below show the top/bottom quartile data
- Sorting by the median, which is the middle horizontal line
- Data labels rely on 2 extra invisible line data sets (‘scatter’ chart type would allow even greater accuracy and control, but at the cost of taking longer time for set-up) to highlight the min & max values, with 50% background transparency; I avoid using auto-leader lines, in spite of the risk of label/box/axis overlaps, so to prevent adding further complexity in this already rather dense (but brilliant) chart type
- Shadows behind the boxes use the same settings as the overall chart shadow.
For the line showing ‘our median’, I programmatically hide the segments that should fall behind boxes, as Excel would otherwise overlay unwanted sections on top. The X axis is also programmatically generated.
These examples show my approach to automated charting in Excel: no external tools, VBA or manual work. Maybe one day even Excel defaults will be fixed in a future release of the software.
For now, I find the results of my approach building the ‘home-made templates’ to be clean, easy to read and fast to render.
Please do not hesitate to let me know if any questions?
PS During my first week at work, back in 2005, Gene Zelazny‘s book “Say It With Charts: The Executive’s Guide to Visual Communication” inspired me.
Even today, this remains one of the very best charting guides ever written. Simple and to the point. Not specific about Excel, but how-to-think why, when and how to use each chart type. Thank you Gene!
About the author
Fabrizio Fantini is the brain behind Evo. His 2009 PhD in Applied Mathematics, proving how simple algorithms can outperform even the most expensive commercial airline pricing software, is the basis for the core scientific research behind our solutions. He holds an MBA from Harvard Business School and has previously worked for 10 years at McKinsey & Company.
He is thrilled to help clients create value and loves creating powerful but simple to use solutions. His ideal software has no user manual but enables users to stand on the shoulders of giants.