“Think big. Throw out the assumption that the big problems – eliminating fraud, mapping the spread of disease, understanding the traffic system, optimizing the energy grid – are unsolvable. They can be solved now.”
Those and similar breakthroughs will be built on the ability to analyze huge amounts of data, almost all of it unstructured, that can be captured in Hadoop-based technology. That will become the basis for a second industrial revolution based on the data factory, he says. And one major reason that they can be solved is that Hadoop technology allows models to be built based on analysis of an entire universe of data rather than a subset. “Sampling in finished. As a bank I can think about eliminating fraud because I can build a model looking at every incidence of fraud going back five years for every single person, rather than sampling the data, building a model, realizing there is an outlier that breaks the model, and then rebuilding the model. Those days are over.”
Being able to analyze complete data is just the first step and Hadoop will definitely help here. But then we need to understand, learn, and deduce future models. And for now, these are problems for humans and machines.
Original title and link: Hadoop: Solving “Unsolvable” Problems (NoSQL databases © myNoSQL)