ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

BigData Batch vs Stream Processing Pros and Cons

The most common complaints [about batch processing]:

  • No partial answers – you have to wait for the entire batch to finish. For big batches this can take a lot of time.
  • Hardware requirements – because they process everything at once, batch systems typically require more hardware (such as memory, disk and CPU) than streaming systems which can process a transaction and then throw it away.
  • Limited ad-hoc capabilities (more on this later).
  • All or nothing – Any change in the data usually requires the entire batch to be recalculated.

I cannot put my finger on it right now, but I don’t think stream processing can cover exactly the same wide range of computations available in batch processing.

While I haven’t had the chance to play with real big data, I believe it is not a matter of either or. An ideal system would need to support:

  • piping incoming data through a combination of filters, preprocessors/transformers, and calculators/extractors
  • preserve (all/relevant) data for later computation
  • allow processing of stored data in either streams or batches

Original title and link: BigData Batch vs Stream Processing Pros and Cons (NoSQL databases © myNoSQL)

via: http://blog.patternbuilders.com/2011/01/26/riding-the-data-waterfall/