5 techniques and links to research papers about data deduplication using HDFS and MapReduce:
Some of the common methods for data deduplication in storage
architecture include hashing, binary comparison and delta
differencing. In this post, we focus on how MapReduce and HDFS can
be leveraged for eliminating duplicate data.
This blog is called myNoSQL and it is written by me, Alex Popescu, a software architect with a passion for open source and communities.
It records my readings, learnings, and opinions on NoSQL databases, polyglot persistence, and distributed systems -- subjects that I'm passionate about.
The opinions expressed here are my own, and no other party necessarily agrees with them.
If you feel I'm biased, I probably am.