This video is part of the How to Avoid Common Data Visualization Mistakes series, presented by Naomi B. Robbins, Data Visualization Expert at NBR.
Transcript:
Another very, very common mistake I see is using equally spaced tick marks for unequal intervals. The figure that you see here shows the numb ...
This video is part of the How to Avoid Common Data Visualization Mistakes series, presented by Naomi B. Robbins, Data Visualization Expert at NBR.
Transcript:
Another very, very common mistake I see is using equally spaced tick marks for unequal intervals. The figure that you see here shows the number of prisoners on death row from 1953 to 2004. That's a period of just over 50 years. I want you to look at the horizontal axis. We go from 1953 to 1994, which is just over 40 years, and we give it roughly half the axis. Then we go '94, by one year, then two years and three years. You can't do that. I mean, they're equally spaced tick marks but they represent different times. It totally distorts the figure. Now let's look at the vertical axis. You see 0, 500, 1000, 1500, looks okay to me. But wait, 2356? They're just numbers out of the blue.
This is from an introductory sociology textbook. It's just totally distorted. So, if you look at the bottom, you can see the data comes from the U.S. Department of Justice Bureau of Justice Statistics. Well, I know statisticians who work there and I didn't think they would ever do something like that, so I went to their website and I found this graph. Same data. I don't particularly like the graph. They are no tick marks on the horizontal axis. I'd like to know exactly where 1993 is, but I'm being very picky because we have even five years on the horizontal axis, even 500 prisoners on the vertical axis. Let's look at them next to each other. If I were to ask you, what fraction of the time were there relatively few prisoners on death row? If you look on the right, you'd say roughly half the time. If you look on the left, you'd say less than a quarter of the time.
So, as you can see, the figure is very distorted from the more accurate one. It makes sense to give more detail to the recent time than to time far back. This comes from a book by Stephen Few and it's not his figure. I don't like the number of tick marks, having two rows of tick marks, but, again, that's a petty thing. Nobody would make a continuous curve from years to months to days. When you come to the end of the panel, it stops your eye, so you wouldn't make that mistake. So, it's perfectly fine to have more detail for recent times, just don't do it in one curve without making it clear to your eye, stop here, this is different.
Here we're talking about percentage of infants who cried when their mothers left. And, in the beginning, you can see we have two month intervals. Then we jump. Now they told us we jumped because they put a scale break in the horizontal access. But, first of all, nobody notices it, and if you do notice it, what's the point of a line that goes straight through a scale break? You can't do that. Here we have some graphs from none other than the prestigious New England Journal of Medicine. What we have is the relative risk of death as a function of the body mass index. Let's look at the horizontal axis. We go two, two, two, and then it goes two, then there's some one and a halfs, and then it goes two, three, five. You can't do that.
What should you do if your time is not evenly spaced? This is from a book by Richard Heiberger and Bert Holland, Statistical Analysis and Data Display. They have data at 1, 2, 4, 8, 12, and 24. They spaced it according to what the data are. So if you don't have data equally spaced, you don't equally space your figure. This is accurate.