After merrily criticizing a complicated diagram (see Entry 29),
the New York Times went on to generate one on their own,
about “Europe's web of debt.” The intricate graphical display,
included below, raises again the two fundamental questions
for a meaningful representation of quantitative information:
how are the data encoded visually, and how is this encoding
perceived—quantitatively, that is—by the display's audience?
The above diagram encodes data in two ways: circles and arrows.
For the circles, the geometric attribute proportional to the data
is the area; for the arrows, it is the line's thickness, not the area.
Neither of these two choices is likely to be immediately apparent,
especially with the outward arrows (going from Italy to the right,
for example) exhibiting different lengths for no particular reason
except convenience of labeling. Moreover, the mix of encodings
makes it hard to relate visually a country's total debt (circle area)
to its components (line thicknesses)—a legitimate objective here,
if only to visualize the remaining part of a country's debt, that is,
what is owed to parties not shown or mentioned on the diagram.
As a rule, encoding a quantitative variable by the size of a circle
is ineffective, for at least two reasons. First, it is usually unclear
what geometric attribute of the circle is proportional to the data:
its area, as in the graph above, or its radius, as in the one below,
showing voting intentions (by Belgian newspaper De Standaard).
Second, and while the area may seem a better option rationally,
the perceived area of a circle is not proportional to its true area;
rather, it is proportional to a power k of this area (Stevens' law),
with k depending on the task but typically in the range 0.6–0.9.
That is, we overestimate a small area compared to a larger one.
The lack of clarity about, and biased perception of, circle sizes
in a given display directly affects our perception of proportions,
which is the whole point of visual representations to start with.
The legend accompanying the New York Times diagram clarifies
that “Arrow widths are proportional to debt amounts” (you know
your design is ineffective when you feel you have to explain it ;–)
but it says nothing about the circles. Neither does De Standaard.
In the latter case, the voting intentions for CD&V are three times
those for LDD, but the area of the CD&V circle is nine times that
for LDD. Given Stevens' law, the visual impact of the CD&V circle
is four to seven times that of LDD, and not three as it should be.
Why not represent this simple, univariate set of data with bars?
Horizontal bars aligned at the left end are intuitive and accurate,
and can elegantly be combined with labels and numerical values.
Alternatives to circle size may be fewer in multivariate data sets,
such as the now famous graphical displays by Gapminder World,
an example of which is shown below. These displays sure exhibit
many qualities—the use of a scatter plots as an exploratory tool,
the flexible interactive options, the attention to variability, etc.—
but the use of circle size to encode the third variable is unclear.
Screen shots suggest that neither radius nor area is proportional
to the variable. What is worse, the scale setting for this variable
(bottom right) allows viewers to specify the circle size (however
it is calculated) as not starting from zero, which is sure to distort
the quantitative perception even more. Simply incomprehensible.
I am not sure that the third variable (by default, population size)
is required in the above display, but, in general, a better option
to explore possible relationships among three or more variables
is to construct a combination of two-dimensional scatter plots,
as sketched on the left. This complex display lends itself well
to interactive analysis à la Gapminder: when the viewer selects
a dot in one panel, the software can highlight or even connect
to the corresponding dots in the other panels, thus revealing
all the variables (within their relationship) for the item selected.
Fri 28 May 2010
If you imagine that an appropriate alternative is to draw
a scatter plot in three dimensions, consider the graphical display below,
seen in a recent article in Nature. I challenge you to pick a dot at random
and read out the corresponding x, y, and z values—or equivalently to pick
two dots of different colors and tell me which one is higher than the other.