NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



How to build a searchable, evolvable entity store?

Given a set of requirements (prepared to scale, data models can evolve, data must be searchable, common access to entities), a data definition language (think Protocol Buffers[1], Thrift[2], Avro[3], JSON[4], BSON[5]), a NoSQL database, how do you build a searchable, evolvable entity store?

Sam Pullara explains how he solved these while ☞ creating HAvroBase:

The first choice you have to make against these requirements is which data definition language are you going to use?


Whereas the data definition choice is basically commodity at this point and your choice can be somewhat arbitrary, the choice of storage technology will likely be something that has more trade-offs to consider.


When it comes to text search you really don’t get better than Lucene in open source and the features that Solr builds on top of Lucene make it even better. I don’t think there is reasonable argument for using something besides Solr at this point. Especially with their support for sharding and replication that comes with Solr Cloud.

The only remark is that the solution might also use other NoSQL databases especially key-value stores (basically, once entities are encoded with Avro, data will become opaque to HBase so its wide-column data model is not a strong requirement).

Source code is available on ☞ GitHub.