The problem of course is that the “power” of big data to help answer challenging questions relies upon the quality of that underlying data. And by “quality,” I don’t simply mean whether the data is accurate (which we will see is a fraught term in itself), but instead I am concerned with what sorts of assumptions are present in the collection of that data, what’s being left out, and how does the process of data collection influence the results?
What I am trying to demonstrate is that data, like science, is not as purely objective as we typically think it is. By assuming the objectivity of the underlying data, we set ourselves up to make large-scale decisions without properly challenging them because they are based on data, and that data “can’t be wrong”. The solution however is not to rid the data of all subjective intrusions because at a certain point this is not possible. What I am advocating is to approach big data with a healthy skepticism and an awareness of the ways in which it is lacking or only presenting a part of the picture."
Massive, crucial point, beautifully expressed - and by an undergrad no less (by name of Evan Freedman).
Comment on The Limits of Big Data by Klint Finley on RWW, June 2011