Simhashing in Hadoop with MapReduce, Cascalog and Cascading
Simhashing in MapReduce is a quick way to find clusters in a huge amount of data. By using Cascading and Cascalog we’re able to work with MapReduce jobs at the level of functions rather than individual map-reduce phases.
Original title and link: Simhashing in Hadoop with MapReduce, Cascalog and Cascading (NoSQL databases © myNoSQL)