terrastore: All content tagged as terrastore in NoSQL databases and polyglot persistence
So, I couldn’t stop asking myself is there a need for a (new) document query language?
To be able to answer this question, I thought I should firstly review what are the existing solutions/approaches.
- MongoDB allows dynamic and pretty complex queries, but it is using a custom query API.
- RavenDB, the latest addition to the document database space, has chosen the route of Linq for defining indexes.
- Terrastore supports predicate (XPath currently) and range queries offering a mapreduce-like solution. You can read more about these in the Terrastore 0.5.0 article
- Last, but not least, XML databases are using XPath for querying.
Simply put, it looks like each solution comes with its own approach. While it will probably make sense to create a unified query language for document databases, I see only two possible solutions:
- either make all document databases sign up to use this query language (note: this might be quite difficult)
- or provide it through a framework that works will all of the existing document stores (note: this might not be possible)
But do not create a new query language in a framework that works only with a single document store.
- Richard Boulton: ☞ Using Redis as a backend for Xapian. An interesting analysis of how a dedicated search engine would work with a Redis backend. Meanwhile others try to simply store the reverted index into Redis¶
- Paul Rosania: ☞ Point-and-Click install of MongoDB on OS X 10.5+. Not that it was difficult before, but nice to have! ¶
- Doug Judd: ☞ Why We Started Hypertable, Inc. … or welcome to the Hypertable Inc. blog. ¶
- Surya Surabarapu: ☞ Terrastore Scala Client. First Terrastore library in our NoSQL libraries list ¶
Last week, Terrastore has seen a new release (0.5.0) and the new version brings quite a few new features:
- Multi-cluster (aka Ensemble) support: Terrastore can now be configured to join multiple clusters together and have them working as a unique cluster ensemble, with data automatically partitioned and accessed among clusters and related nodes. This will greatly enhance scalability, because each cluster will provide its own active master.
- Autowired conditions, functions, predicates and event listeners: users are now able to write their own conditions, functions and so on, annotate and put them in Terrastore classpath, and have them automatically detected and used without modifying Terrastore configuration.
- Custom partitioning APIs: users can now easily write their own custom partitioning scheme, to suit their data-locality needs.
- Conditional get/put operations, very useful for implementing atomic CAS, optimistic locking and so on.
- Basic JMX monitoring of cluster status.
- Brand new java client with fluent interfaces.
I had a chance to talk to Terrastore’s lead developer Sergio Bossa (@sbtourist) about some of these interesting features and here is the result.
NoSQL: Could you please expand a bit on the multi-cluster support? An example of how that works (is data relocated, is there data deduplication happening, etc.) would be really useful.
Sergio Bossa: Terrastore multi-cluster ensemble works by simply configuring server nodes, belonging to multiple independent clusters, in order to discover and communicate with each other regardless the original cluster they belong to.
This way, Terrastore masters will be unaware of each other and will not need to communicate and share data, achieving so a kind of shared-nothing architecture between clusters and allowing independent scaling of different masters, while Terrastore servers will be able to discover each other and partition data for spreading data load and computational power.
Data is partitioned following a two-level scheme: first, each document (depending on its bucket/key and the partitioning strategy used, more on this later) is uniquely assigned to a cluster, and then moved to a specific server node inside the cluster. Data is never relocated/repartitioned among different clusters, that is, each document is permanently assigned to one and only one owning cluster, and this piece of information is shared by all server nodes. As a consequence, if all server nodes from a given cluster die, documents owned by that cluster will be unavailable, and other Terrastore servers (belonging to other clusters) will clearly communicate to the clients that some data will be unavailable due to unavailable clusters.
This happens in order to keep consistency even in case of severe failures or cluster partitions: available clusters, as well as both ends of two partitioned clusters, will continue working and just see part of their data as unavailable. In other words, this kind of setup will keep consistency and partition tolerance but sacrifice some part of data availability.
Please note that a whole cluster turns to be unavailable only if all of its server nodes crash, which may be unlikely if you have several server nodes, so the only other event that could make a cluster unavailable would be a partition.
NoSQL: Conditions, functions and predicates sound a bit like basic components of mapreduce. Am I correct? How are they working?
Sergio: Terrastore doesn’t still support Map/Reduce, but it’s on the issue tracker: the link is ☞ http://code.google.com/p/terrastore/issues/detail?id=4 and I invite the community to take part to the discussion.
Conditions are rather used for range/predicate queries and conditional get/put operations while functions are currently used to atomically update a document value (things like atomic counters, generators and so on).
NoSQL: Could you explain our readers the importance of having conditional get/put operations? Are you aware of how other NoSQL projects are handling these scenarios?
Sergio: Conditional get/put operations with custom conditions come very useful for implementing a wide range of well known techniques related to version control/conflict resolution, such as optimistic concurrency, as well as several other use cases such as bandwidth saving by avoiding getting documents if certain conditions aren’t met and so on.
As far as I know, only Riak, through HTTP headers, and Amazon SimpleDB support similar use cases.
NoSQL: Afaik Cassandra provides support for custom partitioning. A similar concept is present in Twitter Gizzard framework. Are there any default partitioning implementation provided? How did you come up with those?
Sergio: Well, Terrastore custom partitioning APIs are probably way easier to configure: it just boils down to implement and annotate two simple interfaces.
Anyways, default partitioning scheme is based on consistent hashing, as happens in many other products: this has a clear advantage of shielding the user from partitioning details he/she may don’t want to be aware of, as well as of trying to equally balance data distribution.
But, some users may want to have full control over data partitioning. For example, provided they have two kind of buckets, say Customers and Orders, they may want to allocate the former on a given server, and the latter on another one, or, they may want to allocate customers from A to M with related orders on a given server, and others on another one: this may happen for hardware reasons, maybe for allocating more data on bigger machines, or for pure data locality reasons, maybe because they have a third party order processing application on a given server and they just want to keep data and processing as near as possible (which is, we all remember, an essential principle). So, it’s not a matter of creating a “better” partitioning scheme: it’s a matter of creating a partitioning scheme able to fit users’ needs.
Please note that I was talking of allocating data to specific servers, but the same applies to clusters: Terrastore APIs allow users to have full control and decide which cluster and which server in that cluster a given document will be assigned to.
NoSQL: Anything else you’d like to add?
Sergio: Well, a growing developers community is another important aspect, and something I’m trying to strive for more and more.
There’s a guy from Sweden, Sven Johansson, working on a new Java Client implementation, and Greg Luck, of Ecache fame, will probably start contributing soon.
But we need more: so I invite everyone reading this to take the opportunity to participate: “just” subscribing to the mailing list and providing feedback would be crucial and obviously warmly welcome.
NoSQL: Thank you Sergio!
The more mature the NoSQL solutions grow the more they realize the importance of the protocols they are using. And more and more NoSQL projects try not to repeat the LDAP protocol history.
I’d say that the flagship NoSQL projects that understood the benefits of the protocol simplicity are CouchDB, the relaxed document database and SimpleDB, Amazon’s key-value store, both of them looking like being built on the web and for the web (note: as one of the MyNoSQL readers correctly pointed out, the SimpleDB HTTP use is quite incorrect though). But they are definitely not the only one.
Riak, the decentralized key-value store, is also using JSON over HTTP. Not only that but the Basho team, producers of Riak, have decided lately to completely drop their custom protocol ☞ Jiak.
Terrastore, the consistent, partitioned and elastic document database, being quite young, made its homework and debuted as HTTP/JSON friendly.
Neo4j, the graph database, has added recently a RESTful interface, which even if not available in the Neo4j 1.0 release is making it accessible for a new range of programming languages.
There are some NoSQL solutions that are still using custom protocols. Redis has defined its own protocol, but made sure to keep it “easy to parse by a computer and easy to parse by a human”. Redis also got some help from 3rd party tools/libraries to make it even more accessible through HTTP/JSON: RedBottle, a REST app for Redis and Sikwamic, a Redis over HTTP library.
GT.M, a NoSQL solution about which you can learn more from the Introduction to GT.M and M/DB or these two talks at FOSDEM: GT.M and OpenStreetMap and MDB and MDBX: Open Source SimpleDB Projects based on GTM, has also realized the importance of the protocol and is now introducing ☞ M/Wire, which was inspired by the simplicity of Redis protocol.
MongoDB is another example of a NoSQL storage that uses a custom wire protocol. While the MongoDB ecosystem already includes a lot of libraries, I’d really love to see Kristina’s ☞ Sleepy.Mongoose moving forward (nb: Krsitina, I’m also pretty sure that Sleepy.Mongoose can get much nicer RESTful URIs too ;-) ).
And the story can go on and on, but the lesson to be learned should be quite obvious: the simpler and the easier your protocol is the more accessible your data will be and the easier it will be for the community to come up with (innovative) projects and libraries. The NoSQL libraries page should give you a feeling of what NoSQL solutions are using simple protocols and which are not.
While this is not the ☞ original title, I’d say it would summarize pretty well Jilles Van Gurp’s post about the set of features he liked most in CouchDB:
Document oriented and schema less storage
MongoDB encourages a master-slave setup so it will not have to deal with conflict resolution. Terrastore doesn’t have to address this either as the value of a key lives on a specific node and updates will be applied in their chronological order. Riak uses vector clocks for conflict resolution.
Robust incremental replication
Replication is supported by both Riak and MongoDB. Terrastore, being built on top of Terracotta, uses a different strategy:
Terracotta replication is not full, nor geared toward all nodes, but only those actually requiring the replicated data. This is more and more optimized in Terrastore, where, thanks to consistent hashing and partitioning, data is not duplicated at all. Terrastore also guarantees that data will never be duplicated among nodes, unless new nodes are joining or older nodes are leaving, thus requiring data redistribution.
You can read more about Terrastore approach on Terrastore: a consistent, partitioned and elastic document database.
I leave it up to the readers to comment based on their experience how robust replication is in each of the MongoDB, Riak and Terrastore case.
The explanation in the original article is more about durability than fault tolerance. In that regards, as far as I know only MongoDB is not really durable.
On Riak case, each write operation can specify the number of virtual nodes involved in the operation and also the number of successful durably writes.
Terrastore is durably storing its data on master whose availability can be enhanced by putting it in active/passive mode (simply put there can be passive masters that would take on the tasks once the initial master has failed).
I should probably mention that in terms of the fault tolerance definition, both Riak and Terrastore are fault tolerant.
Both Riak and Terrastore are HTTP friendly.
I left at the end the three reasons that are either more specific to CouchDB implementation or are debatable.
- cleanup by replicating
This sounds more like something specific to CouchDB append only approach.
- incremental map reduce on read
I’m not sure I understand the benefits of this.
- it’s fast
As always, this sort of arguments are highly debatable and it always a good idea to use very well crafted benchmarks that fit your app scenarios.
In conclusion I’d say we ended up with 5 reasons you should like CouchDB and Riak and Terrastore. And I bet a change in the requirements (see as an example: On why I think these pro MongoDB arguments are not unique) would result in any other combinations of these four NoSQL solutions.
Special thanks to Sergio Bossa for clarifying some aspects related to Terrastore.
MongoDB 1.2.2 is mostly a bug fix release as can be seen from the ☞ announcement.
Project Voldemort 0.70
This new version of Project Voldemort is including one of the most awaited features: online rebalancing. While the ☞ official announcement makes it clear that rebalancing has been extensively tested, I couldn’t find a good description of the rebalancing algorithm.
There are other interesting features in the release that I’d like to mention:
- New failure detector merged into the main branch
- Beta mechanism for restoring all of node’s data from replicas on demand. This is an alternative to a more gradual mechanism provided by read-repair.
While version change sounded like a minor release, the latest version of Terrastore, the partitioned and elastic document database built on top of Terracotta, features quite a few interesting new things:
- New, configurable, server-to-master reconnection procedure and improved graceful shutdown procedure for server nodes.
- New socket-based internal communication layer, improving multi-node performances and lowering resource consumption.
- New transparent rerouting of client requests in case of failing nodes.
- Improved rebalancing in case of nodes leaving or failing.
I must confess that if I’d be managing Terrastore releases and knew that these features are well tested, I would definitely jumped a couple of versions!
Riak 0.7.1 seems to be mostly a bugfix release, with a pretty cryptic ☞ announcement (at least for someone not familiar with the Riak source code).
- Cassandra is getting closer to release the 0.5.0 version that will bring a ton of improvements. As far as I can tell it will be the first Cassandra release that we will have the chance to cover here on MyNoSQL.
- Neo4j is also getting closer to finally release the 1.0 version. After the neo4j 1.0-b11, the guys have released an RC, so I expect the final release to follow shortly.
- Project Voldemort is preparing to release the 0.70 version which will bring the much awaited rebalancing feature. The previous Project Voldemort release has been preparing the ground for this exciting feature.
- Terrastore have been upgraded to use a new version of Terracotta and also refactored the internal communication protocol this leading to a 5x speedup. I guess the upcoming release will look quite nice.
- Last, but not least, Redis has released the 1.2.0 version. The new version provides quite a few new features, so a more detailed article about Redis 1.2.0 is in work.
I guess these are the last releases for an eventful 2009 NoSQL year:
Mongo 1.2.1 is just a minor release featuring the following bug fixes:
- mongoimport now works on windows
- gcc 4.4 can be used to compile
- better map/reduce error handling
A day after our coverage of Terrastore, a consistent, partitioned and elastic document database, the 0.3 version was released featuring a much easier installation tool. You can read the announcement ☞ here. Sergio Bossa, Terrastore creator, has published a nice summary of what Terrastore is ☞ here.
And with this, I am looking forward to more exciting NoSQL releases in 2010.
Terrastore is a very young Apache licensed document store solution built on top of the Terracotta (an in-memory clustering technology) that released its 0.2 version a couple of days ago.
I had the opportunity to chat with Sergio Bossa (@sbtourist) and have him answer a couple of questions about Terrastore.
Alex: What is it that made you create Terrastore in the first place?
Sergio: I wanted a scalable document store with consistency features, because I think that’s an uncovered topic/space in current implementations, which are all geared toward BASE.
Being a document database, Terrastore belongs to the same category as CouchDB, MongoDB, and Riak. In some regards (f.e. partitioning), Terrastore is similar to Riak. You should also check  to find out more about Terrastore and the CAP theorem.
Terracotta replication is not full, nor geared toward all nodes, but only those actually requiring the replicated data. This is more and more optimized in Terrastore, where, thanks to consistent hashing and partitioning, data is not duplicated at all. Terrastore also guarantees that data will never be duplicated among nodes, unless new nodes are joining or older nodes are leaving, thus requiring data redistribution. A Terrastore client doesn’t need to know where the data is: it can contact whatever Terrastore node and requests will be routed to the proper node holding the value (note: this is similar to the way Dynamo, Project Voldemort, Cassandra and other distributed stores are working)
At this point, more people have joined the chat and so more interesting questions and answers were coming up.
Alex: Considering Terrastore is built on top of Terracotta, is it an in-memory storage making it somehow similar to Redis?
Sergio: Correct, it stores everything in memory, but it is persistent as well. It is not as fast as Redis mainly due to some overhead related to its distributed features.
Paulo Gaspar: Terrastore looks very much like a persistent, transactional Memcached service.
Sergio: Persistent, transactional, and partitioned/sharded. An interesting difference is that afaik Memcached partitioning is done client side, while Terrastore has builtin support for data partitioning, distribution and access routing.
Terrastore is already HTTP and JSON friendly  and the future might bring support for the memcached protocol too.
Please see the following resources to learn more about Terrastore: