Last week, Terrastore has seen a new release (0.5.0) and the new version brings quite a few new features:
- Multi-cluster (aka Ensemble) support: Terrastore can now be configured to join multiple clusters together and have them working as a unique cluster ensemble, with data automatically partitioned and accessed among clusters and related nodes. This will greatly enhance scalability, because each cluster will provide its own active master.
- Autowired conditions, functions, predicates and event listeners: users are now able to write their own conditions, functions and so on, annotate and put them in Terrastore classpath, and have them automatically detected and used without modifying Terrastore configuration.
- Custom partitioning APIs: users can now easily write their own custom partitioning scheme, to suit their data-locality needs.
- Conditional get/put operations, very useful for implementing atomic CAS, optimistic locking and so on.
- Basic JMX monitoring of cluster status.
- Brand new java client with fluent interfaces.
I had a chance to talk to Terrastore’s lead developer Sergio Bossa (@sbtourist) about some of these interesting features and here is the result.
NoSQL: Could you please expand a bit on the multi-cluster support? An example of how that works (is data relocated, is there data deduplication happening, etc.) would be really useful.
Sergio Bossa: Terrastore multi-cluster ensemble works by simply configuring server nodes, belonging to multiple independent clusters, in order to discover and communicate with each other regardless the original cluster they belong to.
This way, Terrastore masters will be unaware of each other and will not need to communicate and share data, achieving so a kind of shared-nothing architecture between clusters and allowing independent scaling of different masters, while Terrastore servers will be able to discover each other and partition data for spreading data load and computational power.
Data is partitioned following a two-level scheme: first, each document (depending on its bucket/key and the partitioning strategy used, more on this later) is uniquely assigned to a cluster, and then moved to a specific server node inside the cluster. Data is never relocated/repartitioned among different clusters, that is, each document is permanently assigned to one and only one owning cluster, and this piece of information is shared by all server nodes. As a consequence, if all server nodes from a given cluster die, documents owned by that cluster will be unavailable, and other Terrastore servers (belonging to other clusters) will clearly communicate to the clients that some data will be unavailable due to unavailable clusters.
This happens in order to keep consistency even in case of severe failures or cluster partitions: available clusters, as well as both ends of two partitioned clusters, will continue working and just see part of their data as unavailable. In other words, this kind of setup will keep consistency and partition tolerance but sacrifice some part of data availability.
Please note that a whole cluster turns to be unavailable only if all of its server nodes crash, which may be unlikely if you have several server nodes, so the only other event that could make a cluster unavailable would be a partition.
NoSQL: Conditions, functions and predicates sound a bit like basic components of mapreduce. Am I correct? How are they working?
Sergio: Terrastore doesn’t still support Map/Reduce, but it’s on the issue tracker: the link is ☞ http://code.google.com/p/terrastore/issues/detail?id=4 and I invite the community to take part to the discussion.
Conditions are rather used for range/predicate queries and conditional get/put operations while functions are currently used to atomically update a document value (things like atomic counters, generators and so on).
NoSQL: Could you explain our readers the importance of having conditional get/put operations? Are you aware of how other NoSQL projects are handling these scenarios?
Sergio: Conditional get/put operations with custom conditions come very useful for implementing a wide range of well known techniques related to version control/conflict resolution, such as optimistic concurrency, as well as several other use cases such as bandwidth saving by avoiding getting documents if certain conditions aren’t met and so on.
As far as I know, only Riak, through HTTP headers, and Amazon SimpleDB support similar use cases.
NoSQL: Afaik Cassandra provides support for custom partitioning. A similar concept is present in Twitter Gizzard framework. Are there any default partitioning implementation provided? How did you come up with those?
Sergio: Well, Terrastore custom partitioning APIs are probably way easier to configure: it just boils down to implement and annotate two simple interfaces.
Anyways, default partitioning scheme is based on consistent hashing, as happens in many other products: this has a clear advantage of shielding the user from partitioning details he/she may don’t want to be aware of, as well as of trying to equally balance data distribution.
But, some users may want to have full control over data partitioning. For example, provided they have two kind of buckets, say Customers and Orders, they may want to allocate the former on a given server, and the latter on another one, or, they may want to allocate customers from A to M with related orders on a given server, and others on another one: this may happen for hardware reasons, maybe for allocating more data on bigger machines, or for pure data locality reasons, maybe because they have a third party order processing application on a given server and they just want to keep data and processing as near as possible (which is, we all remember, an essential principle). So, it’s not a matter of creating a “better” partitioning scheme: it’s a matter of creating a partitioning scheme able to fit users’ needs.
Please note that I was talking of allocating data to specific servers, but the same applies to clusters: Terrastore APIs allow users to have full control and decide which cluster and which server in that cluster a given document will be assigned to.
NoSQL: Anything else you’d like to add?
Sergio: Well, a growing developers community is another important aspect, and something I’m trying to strive for more and more.
There’s a guy from Sweden, Sven Johansson, working on a new Java Client implementation, and Greg Luck, of Ecache fame, will probably start contributing soon.
But we need more: so I invite everyone reading this to take the opportunity to participate: “just” subscribing to the mailing list and providing feedback would be crucial and obviously warmly welcome.
NoSQL: Thank you Sergio!