About Mozilla Grouperfish architecture and choosing a scalable storage solution:
Given our access patterns (insert documents, update clusters, re-process entire collections, fetch lists of clusters), efficient sequential access to selected parts of the data is very important. Sorted, column oriented storage seems to be the way to go. There are other pros and cons (single point of failure, write throughput, hardware requirements), but if we don’t cater to our use case, those won’t ever matter.
And this is what the planned solution is going to look:
- service layer: node.js
- data layer: Redis + HBase
- processing layer: RabbitMQ, Mahout, Jetty
- batching layer: Hadoop
Original title and link: Redis and HBase for Mozilla Grouperfish Storage (NoSQL databases © myNoSQL)