Andy Feng wrote a blog post on YDN blog about the data processing architecture at Yahoo! for delivering personalized content by analyzing billions of events for 700mil. users and 2.2bil content pieces every day using a combination of batch-processing (Hadoop) and stream-processing (Storm):
Enabling low-latency big-data processing is one of the primary design goals
of Yahoo!’s next-generation big-data platform. While MapReduce is a key
design pattern for batch processing, additional design patterns will be
supported over time. Stream/micro-batch processing is one of design patterns
applicable to many Yahoo! use cases. In Q1 2013, we added Storm as a new
service to our big-data platform. Similar to how Hadoop provides a set of
general primitives for doing batch processing, Storm provides a set of
general primitives for stream/micro-batch processing.
✚ I don’t think I’ve seen the term micro-batch processing used before. Any ideas why using it as an alternative to the well established stream processing?
Original title and link: Storm and Hadoop: Convergence of Big-Data and Low-Latency Processing at Yahoo! ( ©myNoSQL)