ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

terrastore: All content tagged as terrastore in NoSQL databases and polyglot persistence

Document Database Query Language

Recently I have noticed that Doctrine[1], a PHP library focused on persistence services, has been working on ☞ defining a new query language for document databases.

So, I couldn’t stop asking myself is there a need for a (new) document query language?

To be able to answer this question, I thought I should firstly review what are the existing solutions/approaches.

  1. CouchDB doesn’t allow running dynamic queries against the store, but you can define views with the help of Javascript-based mapreduce functions.
  2. MongoDB allows dynamic and pretty complex queries, but it is using a custom query API.
  3. RavenDB, the latest addition to the document database space, has chosen the route of Linq[2] for defining indexes.
  4. Terrastore supports predicate (XPath currently) and range queries offering a mapreduce-like solution. You can read more about these in the Terrastore 0.5.0 article
  5. Last, but not least, XML databases are using XPath for querying.

Simply put, it looks like each solution comes with its own approach. While it will probably make sense to create a unified query language for document databases, I see only two possible solutions:

  • either make all document databases sign up to use this query language (note: this might be quite difficult)
  • or provide it through a framework that works will all of the existing document stores (note: this might not be possible)

But do not create a new query language in a framework that works only with a single document store.


  1. ☞ Doctrine project website  ()
  2. ☞ LINQ: a set of extensions to the .NET framework that encompass language-integrated query, set, and transform operations.  ()

NoSQL Ecosystem News & Links 2010-06-14


NoSQL Ecosystem News & Links 2010-06-01

  1. Richard Boulton: ☞ Using Redis as a backend for Xapian. An interesting analysis of how a dedicated search engine would work with a Redis backend. Meanwhile others try to simply store the reverted index into Redis
  2. Paul Rosania: ☞ Point-and-Click install of MongoDB on OS X 10.5+. Not that it was difficult before, but nice to have!
  3. Doug Judd: ☞ Why We Started Hypertable, Inc. … or welcome to the Hypertable Inc. blog.
  4. Surya Surabarapu: ☞ Terrastore Scala Client. First Terrastore library in our NoSQL libraries list

Terrastore 0.5.0: An Interview with Lead Developer Sergio Bossa

Last week, Terrastore has seen a new release (0.5.0) and the new version brings quite a few new features:

  • Multi-cluster (aka Ensemble) support: Terrastore can now be configured to join multiple clusters together and have them working as a unique cluster ensemble, with data automatically partitioned and accessed among clusters and related nodes. This will greatly enhance scalability, because each cluster will provide its own active master.
  • Autowired conditions, functions, predicates and event listeners: users are now able to write their own conditions, functions and so on, annotate and put them in Terrastore classpath, and have them automatically detected and used without modifying Terrastore configuration.
  • Custom partitioning APIs: users can now easily write their own custom partitioning scheme, to suit their data-locality needs.
  • Conditional get/put operations, very useful for implementing atomic CAS, optimistic locking and so on.
  • Basic JMX monitoring of cluster status.
  • Brand new java client with fluent interfaces.

I had a chance to talk to Terrastore’s lead developer Sergio Bossa (@sbtourist) about some of these interesting features and here is the result.

NoSQL: Could you please expand a bit on the multi-cluster support? An example of how that works (is data relocated, is there data deduplication happening, etc.) would be really useful.

Sergio Bossa: Terrastore multi-cluster ensemble works by simply configuring server nodes, belonging to multiple independent clusters, in order to discover and communicate with each other regardless the original cluster they belong to.

This way, Terrastore masters will be unaware of each other and will not need to communicate and share data, achieving so a kind of shared-nothing architecture between clusters and allowing independent scaling of different masters, while Terrastore servers will be able to discover each other and partition data for spreading data load and computational power.

Data is partitioned following a two-level scheme: first, each document (depending on its bucket/key and the partitioning strategy used, more on this later) is uniquely assigned to a cluster, and then moved to a specific server node inside the cluster. Data is never relocated/repartitioned among different clusters, that is, each document is permanently assigned to one and only one owning cluster, and this piece of information is shared by all server nodes. As a consequence, if all server nodes from a given cluster die, documents owned by that cluster will be unavailable, and other Terrastore servers (belonging to other clusters) will clearly communicate to the clients that some data will be unavailable due to unavailable clusters.

This happens in order to keep consistency even in case of severe failures or cluster partitions: available clusters, as well as both ends of two partitioned clusters, will continue working and just see part of their data as unavailable. In other words, this kind of setup will keep consistency and partition tolerance but sacrifice some part of data availability.

Please note that a whole cluster turns to be unavailable only if all of its server nodes crash, which may be unlikely if you have several server nodes, so the only other event that could make a cluster unavailable would be a partition.

NoSQL: Conditions, functions and predicates sound a bit like basic components of mapreduce. Am I correct? How are they working?

Sergio: Terrastore doesn’t still support Map/Reduce, but it’s on the issue tracker: the link is ☞ http://code.google.com/p/terrastore/issues/detail?id=4 and I invite the community to take part to the discussion.

Conditions are rather used for range/predicate queries and conditional get/put operations while functions are currently used to atomically update a document value (things like atomic counters, generators and so on).

NoSQL: Could you explain our readers the importance of having conditional get/put operations? Are you aware of how other NoSQL projects are handling these scenarios?

Sergio: Conditional get/put operations with custom conditions come very useful for implementing a wide range of well known techniques related to version control/conflict resolution, such as optimistic concurrency, as well as several other use cases such as bandwidth saving by avoiding getting documents if certain conditions aren’t met and so on.

As far as I know, only Riak, through HTTP headers, and Amazon SimpleDB support similar use cases.

NoSQL: Afaik Cassandra provides support for custom partitioning. A similar concept is present in Twitter Gizzard framework. Are there any default partitioning implementation provided? How did you come up with those?

Sergio: Well, Terrastore custom partitioning APIs are probably way easier to configure: it just boils down to implement and annotate two simple interfaces.

Anyways, default partitioning scheme is based on consistent hashing, as happens in many other products: this has a clear advantage of shielding the user from partitioning details he/she may don’t want to be aware of, as well as of trying to equally balance data distribution.

But, some users may want to have full control over data partitioning. For example, provided they have two kind of buckets, say Customers and Orders, they may want to allocate the former on a given server, and the latter on another one, or, they may want to allocate customers from A to M with related orders on a given server, and others on another one: this may happen for hardware reasons, maybe for allocating more data on bigger machines, or for pure data locality reasons, maybe because they have a third party order processing application on a given server and they just want to keep data and processing as near as possible (which is, we all remember, an essential principle). So, it’s not a matter of creating a “better” partitioning scheme: it’s a matter of creating a partitioning scheme able to fit users’ needs.

Please note that I was talking of allocating data to specific servers, but the same applies to clusters: Terrastore APIs allow users to have full control and decide which cluster and which server in that cluster a given document will be assigned to.

NoSQL: Anything else you’d like to add?

Sergio: Well, a growing developers community is another important aspect, and something I’m trying to strive for more and more.

There’s a guy from Sweden, Sven Johansson, working on a new Java Client implementation, and Greg Luck, of Ecache fame, will probably start contributing soon.

But we need more: so I invite everyone reading this to take the opportunity to participate: “just” subscribing to the mailing list and providing feedback would be crucial and obviously warmly welcome.

NoSQL: Thank you Sergio!

Just in case you didn’t do it already, head over the Terrastore ☞ project, get on the ☞ mailing list and give it a try!


NoSQL Protocols Are Important

The more mature the NoSQL solutions grow the more they realize the importance of the protocols they are using. And more and more NoSQL projects try not to repeat the LDAP protocol history.

I’d say that the flagship NoSQL projects that understood the benefits of the protocol simplicity are CouchDB, the relaxed document database and SimpleDB, Amazon’s key-value store, both of them looking like being built on the web and for the web (note: as one of the MyNoSQL readers correctly pointed out, the SimpleDB HTTP use is quite incorrect though). But they are definitely not the only one.

Riak, the decentralized key-value store, is also using JSON over HTTP. Not only that but the Basho team, producers of Riak, have decided lately to completely drop their custom protocol ☞ Jiak.

Terrastore, the consistent, partitioned and elastic document database, being quite young, made its homework and debuted as HTTP/JSON friendly.

Neo4j, the graph database, has added recently a RESTful interface, which even if not available in the Neo4j 1.0 release is making it accessible for a new range of programming languages.

There are some NoSQL solutions that are still using custom protocols. Redis has defined its own protocol, but made sure to keep it “easy to parse by a computer and easy to parse by a human”. Redis also got some help from 3rd party tools/libraries to make it even more accessible through HTTP/JSON: RedBottle, a REST app for Redis and Sikwamic, a Redis over HTTP library.

GT.M, a NoSQL solution about which you can learn more from the Introduction to GT.M and M/DB or these two talks at FOSDEM: GT.M and OpenStreetMap and MDB and MDBX: Open Source SimpleDB Projects based on GTM, has also realized the importance of the protocol and is now introducing ☞ M/Wire, which was inspired by the simplicity of Redis protocol.

MongoDB is another example of a NoSQL storage that uses a custom wire protocol. While the MongoDB ecosystem already includes a lot of libraries, I’d really love to see Kristina’s ☞ Sleepy.Mongoose moving forward (nb: Krsitina, I’m also pretty sure that Sleepy.Mongoose can get much nicer RESTful URIs too ;-) ).

And the story can go on and on, but the lesson to be learned should be quite obvious: the simpler and the easier your protocol is the more accessible your data will be and the easier it will be for the community to come up with (innovative) projects and libraries. The NoSQL libraries page should give you a feeling of what NoSQL solutions are using simple protocols and which are not.

Update: I received a hint from Mathias Meyer (@roidrage) that BSON, the binary JSON serialization used by MongoDB, has a new ☞ home


8 Reasons You Should Like CouchDB… and not only

While this is not the ☞ original title, I’d say it would summarize pretty well Jilles Van Gurp’s post about the set of features he liked most in CouchDB:

Document oriented and schema less storage

That’s definitely not unique. Here on MyNoSQL, we are aware of at least 4 document databases: CouchDB, MongoDB, Riak and Terrastore.

Conflict resolution

MongoDB encourages a master-slave setup so it will not have to deal with conflict resolution. Terrastore doesn’t have to address this either as the value of a key lives on a specific node and updates will be applied in their chronological order. Riak uses vector clocks for conflict resolution.

Robust incremental replication

Replication is supported by both Riak and MongoDB. Terrastore, being built on top of Terracotta, uses a different strategy:

Terracotta replication is not full, nor geared toward all nodes, but only those actually requiring the replicated data. This is more and more optimized in Terrastore, where, thanks to consistent hashing and partitioning, data is not duplicated at all. Terrastore also guarantees that data will never be duplicated among nodes, unless new nodes are joining or older nodes are leaving, thus requiring data redistribution.

You can read more about Terrastore approach on Terrastore: a consistent, partitioned and elastic document database.

I leave it up to the readers to comment based on their experience how robust replication is in each of the MongoDB, Riak and Terrastore case.

Fault tolerant

The explanation in the original article is more about durability than fault tolerance. In that regards, as far as I know only MongoDB is not really durable.

On Riak case, each write operation can specify the number of virtual nodes involved in the operation and also the number of successful durably writes.

Terrastore is durably storing its data on master whose availability can be enhanced by putting it in active/passive mode (simply put there can be passive masters that would take on the tasks once the initial master has failed).

I should probably mention that in terms of the fault tolerance definition, both Riak and Terrastore are fault tolerant.

RESTful

Both Riak and Terrastore are HTTP friendly.

I left at the end the three reasons that are either more specific to CouchDB implementation or are debatable.

  • cleanup by replicating

This sounds more like something specific to CouchDB append only approach.

  • incremental map reduce on read

I’m not sure I understand the benefits of this.

  • it’s fast

As always, this sort of arguments are highly debatable and it always a good idea to use very well crafted benchmarks that fit your app scenarios.

In conclusion I’d say we ended up with 5 reasons you should like CouchDB and Riak and Terrastore. And I bet a change in the requirements (see as an example: On why I think these pro MongoDB arguments are not unique) would result in any other combinations of these four NoSQL solutions.


Special thanks to Sergio Bossa for clarifying some aspects related to Terrastore.


Quick reference to latest MongoDB, Project Voldemort, Terrastore, and Riak

After mentioning all these NoSQL releases of the most active and exciting NoSQL week, I thought MyNoSQL should provide a quick reference to all of them.

MongoDB 1.2.2

MongoDB 1.2.2 is mostly a bug fix release as can be seen from the ☞ announcement.

Resources

Project Voldemort 0.70

This new version of Project Voldemort is including one of the most awaited features: online rebalancing. While the ☞ official announcement makes it clear that rebalancing has been extensively tested, I couldn’t find a good description of the rebalancing algorithm.

There are other interesting features in the release that I’d like to mention:

  • New failure detector merged into the main branch
  • Beta mechanism for restoring all of node’s data from replicas on demand. This is an alternative to a more gradual mechanism provided by read-repair.

I’d really love to publish an article about the Voldemort rebalancing implementation, so please do forward this kind request to anyone that can help. ( @strlen, @ijuma: I hope you are reading this).

Resources

Terrastore 0.4

While version change sounded like a minor release, the latest version of Terrastore, the partitioned and elastic document database built on top of Terracotta, features quite a few interesting new things:

  • New, configurable, server-to-master reconnection procedure and improved graceful shutdown procedure for server nodes.
  • New socket-based internal communication layer, improving multi-node performances and lowering resource consumption.
  • New transparent rerouting of client requests in case of failing nodes.
  • Improved rebalancing in case of nodes leaving or failing.

I must confess that if I’d be managing Terrastore releases and knew that these features are well tested, I would definitely jumped a couple of versions!

Resources

Riak 0.7.1

Riak 0.7.1 seems to be mostly a bugfix release, with a pretty cryptic ☞ announcement (at least for someone not familiar with the Riak source code).

Resources


Characterizing Enterprise Systems using the CAP theorem

When building your next distributed system, you will have to make sure that all subsystems are able to deliver the combination of consistency-availability-partition tolerance that you are looking for.

Taylor’s article is a great start for categorizing according to the CAP theorem some of the (enterprise) systems out there: Terracota, Oracle Coherence, GigaSpaces, but also RDBMS and a couple of NoSQL solutions like Amazon Dynamo, BigTable, Cassandra, CouchDB and Project Voldemort.

Another interesting aspect of the article is that it tries to identify how these systems are coping with the missing CAP dimension. Unfortunately, there are a couple of things in the RDBMS analysis that I do not agree with.

An RDBMS provides availability, but only when there is connectivity between the client accessing the RDBMS and the RDBMS itself.

[…] there are several well-known approaches that can be employed to compensate for the lack of Partition tolerance. One of these approaches is commonly referred to as master/slave replication.

RDBMS are not available by themselves. Leaving aside the connectivity issue, RDBMS can become busy performing complex operations or run out of resources and so they can be unavailable.

What the article identifies as a solution for dealing with partition tolerance, master/slave setups are meant in fact to provide some level of availability. But with master/slave consistency becomes only “eventual consistency”.

The other approach mentioned — sharding — is indeed a solution meant to provide some level of partition tolerance. But without replication it gives up to availability.

As side notes:

  • it was interesting to learn that GigaSpaces can behave as either an CA or AP system, depending on the configurable replication scheme (sync vs async).
  • I am wondering if there are any CP solutions out there. I’d speculate that financial services would probably be required to be CP (if distributed).

via: http://javathink.blogspot.com/2010/01/characterizing-enterprise-systems-using.html


Brief NoSQL News

  1. Cassandra is getting closer to release the 0.5.0 version that will bring a ton of improvements. As far as I can tell it will be the first Cassandra release that we will have the chance to cover here on MyNoSQL.
  2. Neo4j is also getting closer to finally release the 1.0 version. After the neo4j 1.0-b11, the guys have released an RC, so I expect the final release to follow shortly.
  3. Project Voldemort is preparing to release the 0.70 version which will bring the much awaited rebalancing feature. The previous Project Voldemort release has been preparing the ground for this exciting feature.
  4. Terrastore have been upgraded to use a new version of Terracotta and also refactored the internal communication protocol this leading to a 5x speedup. I guess the upcoming release will look quite nice.
  5. Last, but not least, Redis has released the 1.2.0 version. The new version provides quite a few new features, so a more detailed article about Redis 1.2.0 is in work.

2009 Last NoSQL Releases

I guess these are the last releases for an eventful 2009 NoSQL year:

Mongo 1.2.1

Mongo 1.2.1 is just a minor release featuring the following bug fixes:

  • mongoimport now works on windows
  • gcc 4.4 can be used to compile
  • better map/reduce error handling

You can read the announcement ☞ here, the complete changelog ☞ here and download Mongo 1.2.1 from ☞ here.

In case you are planning on using MongoDB, I’d encourage you to check these MongoDB screencasts and all MongoDB coverage on MyNoSQL.

Terrastore 0.3

A day after our coverage of Terrastore, a consistent, partitioned and elastic document database, the 0.3 version was released featuring a much easier installation tool. You can read the announcement ☞ here. Sergio Bossa, Terrastore creator, has published a nice summary of what Terrastore is ☞ here.

Neo4j 1.0-b11

Last, but not least I should also mention ☞ Neo4j latest RC before 1.0. The case for graph databases should give you a quick understanding of why and when Neo4j can be a better fit for your app.

And with this, I am looking forward to more exciting NoSQL releases in 2010.


Terrastore: A Consistent, Partitioned and Elastic Document Database

Terrastore is a very young Apache licensed document store solution built on top of the Terracotta (an in-memory clustering technology) that released its 0.2 version a couple of days ago.

I had the opportunity to chat with Sergio Bossa (@sbtourist) and have him answer a couple of questions about Terrastore.

Alex: What is it that made you create Terrastore in the first place?

Sergio: I wanted a scalable document store with consistency features, because I think that’s an uncovered topic/space in current implementations, which are all geared toward BASE.

Being a document database, Terrastore belongs to the same category as CouchDB, MongoDB, and Riak. In some regards (f.e. partitioning), Terrastore is similar to Riak. You should also check [1] to find out more about Terrastore and the CAP theorem.

Terracotta replication is not full, nor geared toward all nodes, but only those actually requiring the replicated data. This is more and more optimized in Terrastore, where, thanks to consistent hashing and partitioning, data is not duplicated at all. Terrastore also guarantees that data will never be duplicated among nodes, unless new nodes are joining or older nodes are leaving, thus requiring data redistribution. A Terrastore client doesn’t need to know where the data is: it can contact whatever Terrastore node and requests will be routed to the proper node holding the value (note: this is similar to the way Dynamo, Project Voldemort, Cassandra and other distributed stores are working)

At this point, more people have joined the chat and so more interesting questions and answers were coming up.

Alex: Considering Terrastore is built on top of Terracotta, is it an in-memory storage making it somehow similar to Redis?

Sergio: Correct, it stores everything in memory, but it is persistent as well. It is not as fast as Redis mainly due to some overhead related to its distributed features.

Paulo Gaspar: Terrastore looks very much like a persistent, transactional Memcached service.

Sergio: Persistent, transactional, and partitioned/sharded. An interesting difference is that afaik Memcached partitioning is done client side, while Terrastore has builtin support for data partitioning, distribution and access routing.

Terrastore is already HTTP and JSON friendly [2] and the future might bring support for the memcached protocol too.

Please see the following resources to learn more about Terrastore: