The term “Big data” was first published by Michael Cox and David Ellsworth in 1997, in the context of visualization challenges for large data sets from NASA simulations. As Gil Press highlights in his recent blog post “A Very Short History of Big Data“, the term “information explosion” and management of large data were topics of research interest even in the early 1940’s (Note: Another big data timeline was posted last month by Uri Friedman). O’Reilly’s Roger Magoulas and others have been influential in promoting this into a mainstream concept across disciplines.
The concept of “BIG” data refers not only to scale but also complexity. Dr. Jim Gray highlighted this in his final public talk on data intensive science in 2007 under the framework of the “fourth paradigm“, where the deluge of data is inundating researchers, requiring development of new methods for data management, integration, visualization and interpretation (eScience).
Phrase map of highly occurring keywords from big data related publications from 2006-2012 (from post by Gali Halevi, MLS, PhD & Dr. Henk F. Moed).
Big data has become an increasing popular area of research. An analysis based on Scopus entries published in a recent Research Trends blog post highlighted the increase in big data related publications in a diverse array of disciplines. Most striking was the phrase-map based on the top 50 occurring keywords in publications from 2006-2012. From the post: “These maps visualize two main characteristics of the text: (1) connections between terms are depicted by the gray lines, where a thicker line notes a stronger relationship between the terms; and (2) the centrality of the terms which are depicted by their font size (the bigger the font, the more frequently a term appears in the text). Clusters of connections may appear when a connection is found between single words but not to other clusters.” Compared with the phrase map for 1995-2005, there is a clear increase in complexity and connectivity of the map as the research area has developed.
As the research in this area intensifies and the popularity of this term increases further, there is some need for caution to avoid big data tunnel vision. Dr. Phil Bourne stated this succinctly “We need to be less fixated on the big data problems”, highlighting the need to also focus on data management issues for the long tail (i.e., “scientists who generate small quantities of data (collectively much larger than the big data problems but distributed) that are not managed and subsequently analyzed in a way that is optimal”). Steve Lohr, in a his recent article for the New York Times, noted the limitations of focusing on big data in a vacuum and reiterated the need to also emphasize experience and intuition. Indeed, balanced thinking and perspective is critical in how we focus not only our research but also policy and education around big data.