Mavuno is an open source, modular, scalable text mining toolkit built upon Hadoop. It supports basic natural language processing tasks (e.g., part of speech tagging, chunking, parsing, named entity recognition), is capable of large-scale distributional similarity computations (e.g., synonym, paraphrase, and lexical variant mining), and has information extraction capabilities (e.g., instance and semantic relation mining). It can easily be adapted to new input formats and text mining tasks.
I’d love to hear from people with more knowledge in the field how Mavuno compares to Mahout.
Original title and link: Mavuno: A Hadoop-Based Text Mining Toolkit ( ©myNoSQL)