Below are some notes I took of this video from remarks by Cukier. Misunderstandings, if any, of what Cukier may have said are mine.
These days questioning the successes of big data is almost as fashionable as the de reguier term “Big Data.” Such nay-saying seems overly critical and underfittingly hollow to me. But this is not to say the term “Big Data” is without it’s faults. The search for depth, beyond the usefully ambiguous term itself, is worth the exploration. First though, allow that the glitter and shine of Big Data as a term are only the garnish and not the main ingredients. Or put another way, Big Data is full of useful concepts and thoughtful topics pulling from several serious components including Data Science, Machine Learning, Cloud Computing, Cloud Storage, voluminous-ubiquitous data prevalence, new tools developed outside the SQL box (see, for example, NoSQL,) and visualization (dataviz). Unpack that list and any one item is worth the effort to explore. Combining these many concepts may seem opportunistic to some, to me it seems a thoughtful and prescient idea to wrap around a nascent combination of technology trends and market forces.
And to that end I commend this article by Jeff Bertolucci in InformationWeek, “Big Decisions Drive Big Data Success.” Bertolucci rightly points out that the goals of analyzing BIG data are the same as analyzing any data. Yes the volume of data has changed, yes there may be new tools that can rapidly lead to new insights. But the foundational challenges of understanding the link between correlation and causation remain the same. That such a challenge is not new in no way diminishes the big concept that is Big Data while noting these challenge may also be a wise warning which the hype-haters continue to raise. (And so be it, that’s a good warning, nonetheless…)
This Bertolucci comment and embedded quote by Steve Jones (directory of strategy for big data and analytics for Capgemini) come from the article:
There’s nothing wrong with exploring pros and cons of Hadoop and other big data platforms, but your project’s goals must be clear from the start. “It’s really about understanding where your business needs to improve and working out where these tools help, rather than starting with the technology and trying to find [where] it fits in.”
UC Berkeley Course Lectures: Analyzing Big Data With Twitter
I ran across this change in google-maps last night: Google Maps Engine
Lite is now in beta mode in the Maps environment. You may already know about “Google Maps Engine Lite” from Google Earth. If not, it appears in the Google Earth “Maps Engine Lite” environment (g-MEL) and has been beta ported to the Google Maps environment. There you can collaborate, import location data, customize visualizations, store visualizations, and easily embed for display into other webviews. It has some similar features (polygons, lines, plot points) that are more intuitive than, Neatline. Other features of Neatline, like Simile integration do not exist. But, while g-MEL does allow for managing layers, it does not currently seem to allow for importing raster images or KML files.
MongoDB: It’s Not JUST About Big Data
A concise and cogent interview of Gail Steinhart at Cornell about the evolving understanding of data. Steinhard talks about Big data and Small data. Here’s a teaser on what data you should keep and manage:
Steinhart: With respect to funder requirements, one of the most common and most basic is “what counts as data” when it comes to what must be shared and preserved. Many agencies and programs define this for their constituents. When they don’t, our Research Data Management Service Group [at Cornell libraries] recommends considering the following: what data would be of use to others? What data are required for someone else to validate your results? And finally, what data are used as the basis for your publications?