slipstream

RSS

Kenneth Cukier (Data Editor, The Economist and co-author of the recent book, Big Data) speaks about big data at TNW Conference Europe 2013 (by thenextweb - YT channel)

Below are some notes I took of this video from remarks by Cukier.  Misunderstandings, if any, of what Cukier may have said are mine.

  • Is Big Data Hype B.S.?
  • Well, is “The Cloud” a Cloud?
  • is “The Web” a Web?
  • Big Data comes out of the sciences, then transported itself towards the social networking/computing realm
  • But fundamentally, MORE data is a different environment than we’ve been in before
  • Doing things at scale means there is new and important shift in what we can learn
  • The change in size means a change in state
  • A change in quantity [of data to analyze] has meant a change in quality
  • Machine Learning is a pillar concept of Big Data
  • e.g. Google Translation works [and works well] because Google has more data to analyze than previous translation attempts.
  • (more, messy, correlations)
  • Datafication = more data & moving data into a format that can be analyzed (and with tools that require less rigid formats or apriori data modeling)
  • States of BD are process, store, analyze, extract value
  • Concerns?  Plenty.  It’s creepy man
  • privacy means something different in this BD age
  • Propensity algorithms make predictions about our future behavior.
  • Perhaps we’ll need to define some new rights then
  • e.g. there was no concept of “Free Speech” before there was a printing press
  • Free agency is a right
  • Be warned:  the “Dictatorship of Data” is placing trust in data w/o understanding the data.  

Should the government know less than Google?

Google updates BigQuery with large results, window functions, query caching, and reduces monthly costs to $0.08/GB

Online Storage Reviewed. - Infomation on the Best Online Storage Providers

A comparative review of online storage and file synchronization cloud services

Paul Miller: Pick, pick, picking away at Big Data

Worth a read …

cloudofdata:

Big Data, it seems, is the meme to attack these days. Yes, it’s overhyped. Yes, too many vendors are scrambling to #BigDataWash half-baked or has-been products in the hope of squeezing a few more sales out of credulous customers. Yes, there’s way too much emphasis on the ‘Big’ bit. But it’s not…

"€˜Big Data"€™ by Viktor Mayer-Schönberger and Kenneth Cukier

Some interesting excerpts from Kakutani’s NYTimes review of the book

  • And in this volume they give readers a fascinating — and sometimes alarming — survey of big data’s growing effect on just about everything: business, government, science and medicine, privacy and even on the way we think. Notions of causality, they say, will increasingly give way to correlation as we try to make sense of patterns. 
  • Data is growing incredibly fast — by one account, it is more than doubling every two years 
  • There is, of course, a dark side to big data, and the authors provide an astute analysis of the dangers they foresee. Privacy has become much more difficult to protect, especially with old strategies — “individual notice and consent, opting out and anonymization” — losing effectiveness or becoming completely beside the point. 
  • “The ability to capture personal data is often built deep into the tools we use every day, from Web sites to smartphone apps,” the authors write. And given the myriad ways data can be reused, repurposed and sold to other companies, it’s often impossible for users to give informed consent to “innovative secondary uses” that haven’t even been imagined when the data was first collected. 
  • BD may bring about a situation “in which judgments of culpability are based on individualized predictions of future behavior.” 
  • One problem with relying on predictions based on probabilities of behavior, Mr. Mayer-Schönberger and Mr. Cukier argue, is that it can negate “the very idea of the presumption of innocence.” 
  • … there is a huge difference between “scientific big data, like data about galaxy formation, weather or flu outbreaks,” which with lots of hard work can be gathered and mined, and “big data about people,” which, like all things human, remains protean, contradictory and often unreliable. 
  • … their book leaves the reader with a keen appreciation of the tools that big data can provide in helping us “quantify and understand the world,” it also warns us about falling prey to the “dictatorship of data.” 
  • There is, of course, a dark side to big data, and the authors provide an astute analysis of the dangers they foresee. Privacy has become much more difficult to protect, especially with old strategies — “individual notice and consent, opting out and anonymization” — losing effectiveness or becoming completely beside the point. 
Jun 7

Tabluea - Waypoints

Jun 7

Where’s the Big Data payoff

These days questioning the successes of big data is almost as fashionable as the de reguier term “Big Data.”  Such nay-saying seems overly critical and underfittingly hollow to me.  But this is not to say the term “Big Data” is without it’s faults.  The search for depth, beyond the usefully ambiguous term itself, is worth the exploration.  First though, allow that the glitter and shine of Big Data as a term are only the garnish and not the main ingredients.  Or put another way, Big Data is full of useful concepts and thoughtful topics pulling from several serious components including Data Science, Machine Learning, Cloud Computing, Cloud Storage, voluminous-ubiquitous data prevalence,  new tools developed outside the SQL box (see, for example, NoSQL,) and visualization (dataviz).  Unpack that list and any one item is worth the effort to explore.  Combining these many concepts may seem opportunistic to some, to me it seems a thoughtful and prescient idea to wrap around a nascent combination of technology trends and market forces.

And to that end I commend this article by  Jeff Bertolucci in InformationWeek, “Big Decisions Drive Big Data Success.” Bertolucci rightly points out that the goals of analyzing BIG data are the same as analyzing any data.  Yes the volume of data has changed, yes there may be new tools that can rapidly lead to new insights.  But the foundational challenges of understanding the link between correlation and causation remain the same.  That such a challenge is not new in no way diminishes the big concept that is Big Data while noting these challenge may also be a wise warning which the hype-haters continue to raise.  (And so be it, that’s a good warning, nonetheless…)

This Bertolucci comment and embedded quote by Steve Jones (directory of strategy for big data and analytics for Capgemini) come from the article:

There’s nothing wrong with exploring pros and cons of Hadoop and other big data platforms, but your project’s goals must be clear from the start. “It’s really about understanding where your business needs to improve and working out where these tools help, rather than starting with the technology and trying to find [where] it fits in.” 

 

Jun 5

Fusion Table - Waypoints

Analyzing Big Data with Twitter

UC Berkeley Course Lectures: Analyzing Big Data With Twitter

(looks interesting.)

(Source: blogs.ischool.berkeley.edu)

May 8

Google Maps Engine Lite

I ran across this change in google-maps last night:  Google Maps Engineimage

Lite is now in beta mode in the Maps environment.  You may already know about “Google Maps Engine Lite”  from Google Earth.  If not, it appears in the Google Earth “Maps Engine Lite” environment (g-MEL) and has been beta ported to the Google Maps environment.  There you can collaborate, import location data, customize visualizations, store visualizations, and easily embed for display into other webviews.  It has some similar features (polygons, lines, plot points) that are more intuitive than, Neatline.  Other features of Neatline, like Simile integration do not exist.  But, while g-MEL does allow for managing layers, it does not currently seem to allow for importing raster images or KML files. 

   

More information at the Google Maps Engine Lite tour and at the Google Earth Tutorial.

May 6

A Crash Course in R

below is the kind of thing I now wish I’d been given when I first started using it – something with simple logically-progressive examples and minimal explanatory text. 

MongoDB:  It’s Not JUST About Big Data

What is Big data, Small data, and what do I Keep

A concise and cogent interview of Gail Steinhart at Cornell about the evolving understanding of data.  Steinhard talks about Big data and Small data.  Here’s a teaser on what data you should keep and manage:

Steinhart: With respect to funder requirements, one of the most common and most basic is “what counts as data” when it comes to what must be shared and preserved. Many agencies and programs define this for their constituents. When they don’t, our Research Data Management Service Group [at Cornell libraries] recommends considering the following: what data would be of use to others? What data are required for someone else to validate your results? And finally, what data are used as the basis for your publications?

White House Kicking off a Series of Big Data Workshops

The White House will be hosting a Big Data Workshop on May 3, 2013.  The workshop is sponsored by the Office of Science and
Technology Policy and the NITRD Big Data Senior Steering Group.

read more

(Source: http-www-cccblog-org-2013-04-14-white-house-kicking-off-a-series-of-big-data-workshops)