The most common complaints [about batch processing]:
- No partial answers – you have to wait for the entire batch to finish. For big batches this can take a lot of time.
- Hardware requirements – because they process everything at once, batch systems typically require more hardware (such as memory, disk and CPU) than streaming systems which can process a transaction and then throw it away.
- Limited ad-hoc capabilities (more on this later).
- All or nothing – Any change in the data usually requires the entire batch to be recalculated.
I cannot put my finger on it right now, but I don’t think stream processing can cover exactly the same wide range of computations available in batch processing.
While I haven’t had the chance to play with real big data, I believe it is not a matter of either or. An ideal system would need to support:
- piping incoming data through a combination of filters, preprocessors/transformers, and calculators/extractors
- preserve (all/relevant) data for later computation
- allow processing of stored data in either streams or batches
Original title and link: BigData Batch vs Stream Processing Pros and Cons (NoSQL databases © myNoSQL)