ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Strategies for Exploiting Large-scale Data

In a guest post hosted by Cloudera blog, Bob Gourley[1] enumerates the characteristics of working with Big Data from federal agencies perspective.

I think these can be generalized to all businesses and problems that require big data:

Federal IT leaders are increasingly sharing lessons learned across agencies. But approaches vary from agency to agency.

For a long time each business worked in its own silo.

Yesterday, tools and algorithms represented the competitive advantage. Today the competitive advantage is in data. Sharing algorithms, experience, and ideas is safe.

federal thought leaders across all agencies are confronted with more data from more sources, and a need for more powerful analytic capabilities

If you are not confronted with this problem it is just because you didn’t realize it. If you think single sources of data are good enough, your business might be at risk.

Large-scale distributed analysis over large data sets is often expected to return results almost instantly.

Name a single manager or a business or a problem solver that wouldn’t like to get immediate answers.

  • Most agencies face challenges that involve combining multiple data sets — some structured, some complex — in order to answer mission questions.

  • increasingly seeking automated tools, more advanced models and means of leveraging commodity hardware and open source software to conduct distributed analysis over distributed data stores

Ditto

considering ways of enhancing the ability of citizens to contribute to government understanding by use of crowd-sourcing type models

Werner Vogels mentioned in his Strata talk using Amazon Mechanical Turk for adding human-based processing for data control, data validation and correction, and data enrichment.


  1. Bob Gourley: editor of CTOvision.com and a former Defense Intelligence Agency (DIA) CTO, @bobgourley  

Original title and link: Strategies for Exploiting Large-scale Data (NoSQL databases © myNoSQL)