Data Deduplication Tactics With HDFS and MapReduce
5 techniques and links to research papers about data deduplication using HDFS and MapReduce:
Some of the common methods for data deduplication in storage architecture include hashing, binary comparison and delta differencing. In this post, we focus on how MapReduce and HDFS can be leveraged for eliminating duplicate data.
Patrick Durusau
Original title and link: Data Deduplication Tactics With HDFS and MapReduce (©myNoSQL)
via: http://www.hadoopsphere.com/2013/02/data-de-duplication-tactics-with-hdfs.html