In the recent issue of the TIME magazine James Poniewozik makes an interesting observation about Web 2.0: “The ability to share information is changing our lives in tiny ways (see your Facebook news feed) and huge ones — witness the WikiLeaks phenomenon (…) Information has its limits, however. We’re becoming a society of constant documenters, but recording problems is not always enough to fix them” (source).
I agree with that.
This article in Wire (“The End of Theory: The Data Deluge Makes the Scientific Method Obsolete“) is a bit older, but I was hinted to it only some days ago. The author claims, that “faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete”. And further: “With enough data, the numbers speak for themselves”.
I don’t think so. I suspect that what the author actually means is that we may not anymore need to think much about sampling since we have a complete dataset from a population of interest (e.g. all customers of firm X).
But even if that’s true (what I also doubt), numbers don’t speak for themselves. We still need statistics to test competing theoretical models, discover patterns in data (e.g. via clustering/classification) or simply reduce the massive amount of data to something that we actually can process in a reasonable amount of time (e.g. dimensionality reduction via scaling).