mapreduce: All content tagged as mapreduce in NoSQL databases and polyglot persistence
MapReduce has become a dominant parallel computing paradigm for big data, i.e., colossal datasets at the scale of tera-bytes or higher. Ideally, a MapReduce system should achieve a high degree of load balancing among the participating machines, and minimize the space usage, CPU and I/O time, and network transfer at each machine. Although these principles have guided the development of MapReduce algorithms, limited emphasis has been placed on enforcing serious constraints on the aforementioned metrics simultaneously. This paper presents the notion of minimal algorithm, that is, an algorithm that guarantees the best parallelization in multiple aspects at the same time, up to a small constant factor. We show the existence of elegant minimal algorithms for a set of fundamental database problems, and demonstrate their excellent performance with extensive experiments.
Start with the definition of the minimal MapReduce algorithms and you’ll find yourself diving into the paper (even if the proof parts are complex).
In 4 years of writing this blog I haven’t seen such a prolific month:
- Apache Hadoop 2.2.0 (more links here)
- Apache HBase 0.96 (here and here)
- Apache Hive 0.12 (more links here)
- Apache Ambari 1.4.1
- Apache Pig 0.12
- Apache Oozie 4.0.0
- Plus Presto.
Actually I don’t think I’ve ever seen such an ecosystem like the one created around Hadoop.
Original title and link: A prolific season for Hadoop and its ecosystem ( ©myNoSQL)
I’ve learned that there’s an Apache Hadoop compatibility guide that covers API, wire, Java binary compatibility, any many other such aspects.
✚ Karthik Kambatla posted on Cloudera’s blog Writing Hadoop programs that work across releases that looks at the Hadoop API annotations and compatibility policies.
Original title and link: Apache Hadoop Compatibility Guide ( ©myNoSQL)
At the time I’m reading this Ask HN: To everybody who uses MapReduce: what problems do you solve?, there aren’t many interesting answers.
✚ Compare it with AskReddit: What is an invention that the human race is fully capable of making, but hasn’t been made yet?
Original title and link: To everybody who uses MapReduce: what problems do you solve? ( ©myNoSQL)