If you’d ask me this question, I’m sure my initial answer would be: “absolutely”. And I guess I would not be alone. But is that the right answer?
While watching GigaOm’s Structure Big Data event, there were two talks that gave me a different perspective on this question.
Firstly, it was the interview with Kevin Krim, the Global Head of Bloomberg Digital, which told the story of adopting, mining, and materializing Big Data inside a corporation that didn’t believe in it, nor did it allocate large budgets to it. The result: collecting more than a terabyte of data every day from 100 data points for every pageview and running 15 different parallel algorithms to make recommendations that led sometimes to 10x clickthrough rates. The interview is embedded at the end of this post.
The second story, coming from Pete Warden, founder of OpenHeatMap, is even more exciting. Pete has used a combination of right tools deployed on the cloud to mine Facebook data: 500 million pages for $100 — that was the cost before being sued by Facebook.
Pete Warden distilled his experience with these tools and has made available at datasciencetoolkit.org a collection of data tools and open APIs in both an Amazon AMI format to be run on the cloud and as a VMWare image to run locally. I highly recommend watching Pete’s talk which I’ve embedded below.
While it depends on what definition of BigData we’d use, both these talks are leading to a simple conclusion:
- you need imagination to get started with Big Data
- you need to use the right tools for getting good results
Is this going to work at the scale of Twitter, LinkedIn, Facebook, Google? Probably not. But before getting at that size, you need to start somewhere. And both these talks suggest a clear answer to the question “does big data need big budgets?”: not always.