These are my notes from the book "How to Lie with Statistics" by Darrell Huff. It was originally written in 1954 and highlights a lot of funny ways statistics and numbers can be twisted to draw conclusions that simply are not so.
To be worth much, a report based on sampling must use a representative sample, which is one from which every source of bias has been removed.
An unqualified average is meaningless. This is because the average could mean the mean, median, or mode.
There is a test of significance that is easy to understand. It is a simple way of reporting how likely it is that a test figure represents a real result rather than something produced by chance. This is the little figure that is not there, on the assumption that, you, the lay reader, wouldn’t understand it. Or that, where there’s an ax to grind, you would. If the source of your information gives you also the degree of significance you’ll have a better idea of where you stand. This degree of significance is most simply expressed as a probability.
There is another kind of little figure that is not there, one whose absence can be just as damaging. It is the one that tells the range of things, or their deviation from the average, that is given. Often an average, whether a mean or median, specified or unspecified, it’s such an oversimplification that it is worse than useless. Knowing nothing about a subject is frequently healthier than knowing what is not so, and a little learning may be a dangerous thing.
The semi-attached figure: If you can’t prove what you want to prove, then demonstrate something else and pretend that they are the same thing. In the days that follows the collision of statistics with the human mind, hardly anybody will notice the difference. The semi attached figure is a device guaranteed to stand you in good stead. It always has.
The semi-attached figure is a situation in which one idea cannot be proven, so the author pulls the old bait-and-switch, stating a completely different idea and pretending it is the same thing.
There are often many ways of expressing any figure. You can, for instance, express exactly the same fact by calling it a 1% return on sales, a 15% return on investment, a $10 million profit, an increase in profits of 40%, or a decrease of 60% from last year. The method is to choose the one that sounds best for the purpose at hand, and trust that few who read it will recognize how imperfectly it reflects the situation.
It’s an interesting fact that the death rate, or incidence of deaths, is often a better measure of the incidence of an ailment than direct incidence figures. Simply because the quality of reporting and record keeping is so much higher on the fatalities.
As long as the errors remain one sided, it’s not easy to attribute them to bungling or accident. One of the trickiest ways to misrepresent statistical data is by means of a map. A map introduces a fine bag of variables in which facts can be concealed and relationships distorted.
Watch out for applications of post hoc logic. Post hoc is a logical fallacy in which one event is said to be the cause of a later event simply because it occurred earlier, we cannot simply assume that the one would not have occurred without the other. Correlation does not indicate causation.
Something to watch out for is a conclusion in which a correlation has been inferred to continue beyond the data with which it has been demonstrated.
Now these figures conclusively show that people who have gone to college make more money than people who haven’t. The only thing wrong is that along with the figures and facts goes a totally unwarranted conclusion. This is the post hoc fallacy at its best. It says that these figures show that if you, your son, your daughter, attend college, you will probably earn more money than if you decide to spend the next 4 years in some other manner. The unwarranted conclusion has at its basis the equally unwarranted assumption that since college trained folks make more money, they make it because they went to college. Actually, we don’t know but that these are the people that would have made more money even if they hadn’t gone to college. There are a couple of things that indicate rather strongly that this is so. Colleges get a disproportionate number of two groups of kids. The bright and the rich. The bright might show good earning power without college knowledge. And as for the rich ones, well, money breeds money in many obvious ways.
Any percentage figure based on a small number of cases is likely to be misleading. It is more informative to give the figure itself. And when the percentage is carried out to decimal places you begin to run the scale from the silly to the fraudulent.
Many a statistic is false on its face. It gets by only because the magic of numbers brings about a suspension of common sense.
“Does it make sense” will often cut a statistic down to size when the whole rigmarole is based on an unproved assumption.
Sometimes what is missing is the factor that caused a change to occur. This omission leaves the implication that some other more desired factor is responsible.
Strange things crop out when figures are based on what people say, even about things that seem to be objective facts.