5 techniques and links to research papers about data deduplication using HDFS and MapReduce:
Some of the common methods for data deduplication in storage
architecture include hashing, binary comparison and delta
differencing. In this post, we focus on how MapReduce and HDFS can
be leveraged for eliminating duplicate data.
Original title and link: Data Deduplication Tactics With HDFS and MapReduce ( ©myNoSQL)