Cardinality Estimation Algorithms: Memory Efficient Solutions for Counting 1 Billion Distinct Objects
Matt Abrams from Clearspring:
Cardinality estimation algorithms trade space for accuracy. To illustrate this point we counted the number of distinct words in all of Shakespeare’s works using three different counting techniques. Note that our input dataset has extra data in it so the cardinality is higher than the standard reference answer to this question. The three techniques we used were Java HashSet, Linear Probabilistic Counter, and a Hyper LogLog Counter. Here are the results:

Original title and link: Cardinality Estimation Algorithms: Memory Efficient Solutions for Counting 1 Billion Distinct Objects (©myNoSQL)