Sam Shah in a guest post on Hortonworks blog:
If Pig is the “duct tape for big data”, then DataFu is the WD-40. Or
something. […] Over the years, we developed several routines that were used across LinkedIn and were thrown together into an internal package we affectionately called “littlepiggy.”
“a penetrating oil and water-displacing spray“? “littlepiggy”? Seriously?
How could one come up with these names for such a useful library of statistical functions, PageRank, set and bag operations?
Original title and link: DataFu: A Collection of Pig UDFs for Data Analysis on Hadoop by LinkedIn ( ©myNoSQL)