Hints on visualising data on the coronavirus epidemic

14Jul 2020
The Guardian
Hints on visualising data on the coronavirus epidemic

THE Johns Hopkins Covid-19 dashboard – with its clear numbers, and its red bubbles on a dark world map – has become the trusted picture of the coronavirus disease for many journalists and audiences around the world.

professor Alberto Cairo.

By Rowan Philp

And it represents the broader emergence of data visualisation tools as one of the most powerful vehicles for public understanding of the invisible global threat.

But, according to world-renowned visualisation professor Alberto Cairo, even that excellent Johns Hopkins graphic could be improved.

That’s because the bubbles over Europe represent cases by country, while those over the United States represent counties – and some audiences, he said, might misunderstand the volume of bubbles over the US.

“This graphic is wonderful,” said Cairo, who holds the Knight Chair in Visual Journalism at the School of Communication of the University of Miami.

He adds, rhetorically: “But should we make the level of this data more consistent? Perhaps at just the national level, and then we can zoom in to the county level?”

In the twelfth webinar in GIJN’s series on Investigating the Pandemic, held recently, investigative reporter Danielle Ivory and health data expert Amanda Makulec joined Cairo in sharing insights on how journalists should choose and present graphic forms, and the data behind them. They spoke in front of an online audience of 266 journalists from 46 countries.

The panel’s consensus was this: having carefully verified the information, journalists should not only show the data in the most appropriate and digestible forms, but should also clearly explain both the graphic and the data, and the uncertainty behind it.

Cairo said visualisation has proved to be one of the most effective information delivery formats globally, helping the public’s understanding of the pandemic.

“I think that it’s clear that the most difficult part of covering the coronavirus crisis has had to do with the quality of the data, and not visualisation,” said Cairo, whose latest book on the craft is entitled How Charts Lie – Getting  Smarter about Visual Information.

“If there is a piece of good news, it is that visualisation has been a winner, and is becoming more popular. But I have also observed many mistakes in how data about the pandemic has been visualized,” said the professor.

Makulec, a health information expert and the operations director at the Data Visualisation Society, meanwhile warned that reporters need to understand how Covid-19 data are collected and aggregated before considering the information for use in graphics or charts.

For instance, she showed ten separate steps – from the taking of swabs to inputs on testing site spreadsheets – at which human error or data lags could occur before Covid-19 case counts are reported in national datasets.

Ivory, an investigative reporter at The New York Times, said apples-to-apples data comparisons on Covid-19 cases represented a major challenge, with health officials across states and counties frequently citing different datasets or using differing definitions.

Some might cite confirmed cases or deaths, while others might cite probable cases – and then switch to the other approach, or revise their numbers.

Last month, Ivory and her colleagues revealed that more than one-third of all Covid-19 deaths in the US were related to long-term care institutions, including nursing homes.

“We were able to collect data from almost every state, and we’re still collecting – it’s pretty much an around-the-clock effort,” said Ivory.

She added: “About 70 per cent of it is collected manually, with calls or going to a state’s website, and much of the rest is collected via automated scraper, and hopefully more can be collected that way to make it a sustainable process. But we are very careful to be transparent about what we don’t know.”

Ivory said making phone calls directly to health officials remained the best way to sort out apparently-confused or contradictory data that flowed in.

Drawing from the three speakers, here are some ten top hints on how to get the visualisation of COVID-19 data right.

One: Explain how to read the graphic, before you explain how to read the data. In a recent graphic on jobs lost due to the pandemic, The New York Times included prominent explainers, using simple language like this: “Each bubble on this chart represents an occupation. The bigger the bubble, the more people do that job.”

Two: Write the text of your graphic at the same time that you are designing the graphic, as this process helps to frame the process for both you and the reader.

Three: Sort the data in an intuitive way – such as chronologically, or in comparable groups. Cairo reorganised Covid-19 data from a confusing bar graph from the Georgia Department of Health into a new chart grouped by county, and arranged chronologically.

Four: If you or your audience are new to data visualisations, start simple, with basic graphics like maps, bar graphs, or line charts. Consider tools like Datawrapper, Flourish, and iNZight. Follow expert online tutorials on free tools, like Cairo’s guide.

Five: Don’t limit yourself to simple tools and charts. Challenge your audience occasionally with incremental changes in how you present data visually.

Six: Don’t try to visualise too many data, and edit them down if they seem over-presented. Define the key points and stick to those.

Seven: There are no bad visualisation formats, but some are more appropriate for the dataset and the audience than others. Charts that seem especially counter-intuitive may need a secondary chart as a reference point. For instance, cartograms – which distort areas on a map, depending on their relative share of a variable – should be presented with an ordinary map of that area alongside for comparison.

Eight: Use linear scales for numbers, and non-linear scales – including logarithmic scales – for rates of change. Explain non-linear graphics clearly, and prominently, as readers often find these hard to understand.

Using generations of gerbils as his data points, Cairo contrasted the linear scale with a logarithmic scale to show why non-linear scales are important in illustrating rates of change.

Nine: Display data uncertainty visually, where you can, like margins of error or confidence intervals. The uncertainty that cannot be quantified – such as how the data were generated – can still be disclosed in written text.

Ten: Forget the traditional design mantra of “show-don’t-tell”. Cairo said visual journalists need to “show and tell”. Once considered an afterthought by many designers, the text portion of a graphic, known as the “annotation layer” is now considered crucial, both in terms of re-emphasizing the main takeaways, and for public understanding of the graphic form itself.

  • This article was originally published by the Global Investigative Journalism Network. Rowan Philp is a reporter for GIJN, formerly chief reporter for South Africa’s Sunday Times. As a foreign correspondent, he has reported on news, politics, corruption, and conflict from more than two dozen countries around the world.

Top Stories