ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Behemoth: All content tagged as Behemoth in NoSQL databases and polyglot persistence

Scaling Solr Indexing With SolrCloud, Hadoop and Behemoth

Grant Ingersoll:

Instead of doing all the extra work of making sure instances are up, etc., however, I am going to focus on using some of the new features of Solr4 (i.e. SolrCloud whose development effort has been primarily led by several of my colleagues: Yonik Seeley, Mark Miller and Sami Siren) which remove the need to figure out where to send documents when indexing, along with a convenient Hadoop-based document processing toolkit, created by Julien Nioche, called Behemoth that takes care of the need to write any Map/Reduce code and also handles things like extracting content from PDFs and Word files in a Hadoop friendly manner (think Apache Tika run in Map/Reduce) while also allowing you to output the results to things like Solr or Mahout, GATE and others as well as to annotate the intermediary results.

I have to agree with Karussell:

Scaling Solr means using Solr AND X AND Y AND… Scaling ElasticSearch means using ElasticSearch

Original title and link: Scaling Solr Indexing With SolrCloud, Hadoop and Behemoth (NoSQL database©myNoSQL)

via: http://www.lucidimagination.com/blog/2012/03/05/scaling-solr-indexing-with-solrcloud-hadoop-and-behemoth/