Numbers don’t speak for themselves

This article in Wire (“The End of Theory: The Data Deluge Makes the Scientific Method Obsolete“) is a bit older, but I was hinted to it only some days ago. The author claims, that “faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete”. And further: “With enough data, the numbers speak for themselves”.

I don’t think so. I suspect that what the author actually means is that we may not anymore need to think much about sampling since we have a complete dataset from a population of interest (e.g. all customers of firm X).

But even if that’s true (what I also doubt), numbers don’t speak for themselves. We still need statistics to test competing theoretical models, discover patterns in data (e.g. via clustering/classification) or simply reduce the massive amount of data to something that we actually can process in a reasonable amount of time (e.g. dimensionality reduction via scaling).