One of the seminal papers establishing the importance of data visualization (as it is now called) was the 1973 paper by F J Anscombe in http://www.sjsu.edu/faculty/gerstman/StatPrimer/anscombe1973.pdf
It has probably the most elegant introduction to an advanced statistical analysis paper that I have ever seen-
1. Usefulness of graphs
Most textbooks on statistical methods, and most statistical computer programs, pay too little attention to graphs. Few of us escape being indoctrinated with these notions:
(1) numerical calculations are exact, but graphs are rough;
(2) for any particular kind of statistical data there is just one set of calculations constituting a correct statistical analysis;
(3) performing intricate calculations is virtuous, whereas actually looking at the data is cheating.
A computer should make both calculations and graphs. Both sorts of output should be studied; each will contribute to understanding.
Of course the dataset makes it very very interesting for people who dont like graphical analysis too much.
From http://en.wikipedia.org/wiki/Anscombe%27s_quartet
The x values are the same for the first three datasets.
Anscombe’s Quartet
| I |
II |
III |
IV |
| x |
y |
x |
y |
x |
y |
x |
y |
| 10.0 |
8.04 |
10.0 |
9.14 |
10.0 |
7.46 |
8.0 |
6.58 |
| 8.0 |
6.95 |
8.0 |
8.14 |
8.0 |
6.77 |
8.0 |
5.76 |
| 13.0 |
7.58 |
13.0 |
8.74 |
13.0 |
12.74 |
8.0 |
7.71 |
| 9.0 |
8.81 |
9.0 |
8.77 |
9.0 |
7.11 |
8.0 |
8.84 |
| 11.0 |
8.33 |
11.0 |
9.26 |
11.0 |
7.81 |
8.0 |
8.47 |
| 14.0 |
9.96 |
14.0 |
8.10 |
14.0 |
8.84 |
8.0 |
7.04 |
| 6.0 |
7.24 |
6.0 |
6.13 |
6.0 |
6.08 |
8.0 |
5.25 |
| 4.0 |
4.26 |
4.0 |
3.10 |
4.0 |
5.39 |
19.0 |
12.50 |
| 12.0 |
10.84 |
12.0 |
9.13 |
12.0 |
8.15 |
8.0 |
5.56 |
| 7.0 |
4.82 |
7.0 |
7.26 |
7.0 |
6.42 |
8.0 |
7.91 |
| 5.0 |
5.68 |
5.0 |
4.74 |
5.0 |
5.73 |
8.0 |
6.89 |
For all four datasets:
| Property |
Value |
| Mean of x in each case |
9 exact |
| Variance of x in each case |
11 exact |
| Mean of y in each case |
7.50 (to 2 decimal places) |
| Variance of y in each case |
4.122 or 4.127 (to 3 d.p.) |
| Correlation between x and y in each case |
0.816 (to 3 d.p.) |
| Linear regression line in each case |
y = 3.00 + 0.500x (to 2 d.p. and 3 d.p. resp.) |
But see the graphical analysis –
and ODS Statistical Graphs at
Pretty graphs make for better decisions too !