Dangerous statistics

"There are two breads. You eat two. I do not eat any. Average consumption: one bread per person."

This was how statistics worked for Nicanor Parra. The Chilean poet, probably unfamiliar with statistical theory, nevertheless understood two of its most remarkable and dangerous qualities: (a) statistics' capacity to be manipulated (in their elaboration, in their presentation) and thus transmit a deformed, biased or lying reality, which serves to confuse and manipulate; and (b) people's tendency to blindly believe in statistics, even if they do not understand what they are or how they are produced (a situation from which the English sentence "lies, damned lies, and statistics" emerged).

As a discipline, Statistics manages very powerful tools. That makes it a source of potential threats. All Statistics' handbooks, without exception, emphasize the problems, risks and biases involved in the application of statistical techniques and tools to a set of data. "Cooking" (manipulating) statistics is amazingly easy ― in fact, there are already classic books on the subject, like the very basic one by Darrell Huff, and serious reports that warn about certain practices, especially when presenting budgets or results of projects.

Handbooks also point out that statistical analysis allows approaching certain results and inferring a number of things within certain ranges, and even (with caution) establishing hypothetical patterns and models. But in no case does it allow for anything more than nebulous possibilities or extrapolations. These handbooks also indicate that subjecting qualitative data to statistical analysis means forcing them to go through a terrible "Procrustean bed."

However, nowadays we have data science, data mining, R-language and tons of new techniques and their sub-sub-sub-disciplines (and their corresponding courses, webinars, MOOCs, meetings, conferences and publications) which are nothing else but old, good Statistics, but applied to modern "big" data with the help of contemporary technology (and with a much cooler name). We also have chats, statements, works and results that clearly demonstrate that the practitioners of these "disciplines" did not read the basic handbooks, or that they had a very hasty and superficial training...

Most of them seem to ignore the two basic problems identified by Mr Nicanor Parra. But they are deeply convinced that they know something anyways...

[Libraries are no stranger to statistics ― and their problems. There is even an IFLA manifesto about this topic (which, to tell the truth, adds nothing to the conversation). However, little is said in librarianship about the risks of statistics, about potential conflicts and biases... As with many other aspects of the discipline and the profession, everything seems to be positive and great. But no, it is not].


