I guess everyone with some interest in Hadoop already knows the story of NY Times converting more than 130 years worth of articles (11 million articles in TIFF format) into PDFs using Hadoop and Amazon EC2 . What I didn’t know though is that this wasn’t an one-time only project, NY Times continuing to use Hadoop for other projects  and that they open sourced  the Map/Reduce Toolkit (MRToolkit)  project for use with a not so well known feature: Hadoop Streaming 
It takes care of the details of setting up and running Apache Hadoop jobs, and encapsulates most of the complexity of writing map and reduce steps. The toolkit, which is Ruby-based, provides the framework — you only have to supply the details of the map and reduce steps.
There is also another Ruby library for Hadoop streaming: ☞ wukong which simplifies the data interaction layer:
Treat your dataset like a
- stream of lines when it’s efficient to process by lines
- stream of field arrays when it’s efficient to deal directly with fields
- stream of lightweight objects when it’s efficient to deal with objects
Do you have any favorite library that you use with Hadoop? Is it in our NoSQL libraries list?