How to build a searchable, evolvable entity store?
Given a set of requirements (prepared to scale, data models can evolve, data must be searchable, common access to entities), a data definition language (think Protocol Buffers[1]
, Thrift[2]
, Avro[3]
, JSON[4]
, BSON[5]
), a NoSQL database, how do you build a searchable, evolvable entity store?
Sam Pullara explains how he solved these while ☞ creating HAvroBase:
The first choice you have to make against these requirements is which data definition language are you going to use?
[…]
Whereas the data definition choice is basically commodity at this point and your choice can be somewhat arbitrary, the choice of storage technology will likely be something that has more trade-offs to consider.
[…]
When it comes to text search you really don’t get better than Lucene in open source and the features that Solr builds on top of Lucene make it even better. I don’t think there is reasonable argument for using something besides Solr at this point. Especially with their support for sharding and replication that comes with Solr Cloud.
The only remark is that the solution might also use other NoSQL databases especially key-value stores (basically, once entities are encoded with Avro, data will become opaque to HBase so its wide-column data model is not a strong requirement).
Source code is available on ☞ GitHub.