NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



rethinkdb: All content tagged as rethinkdb in NoSQL databases and polyglot persistence

RethinkDB 1.13: new protocol and push-pull APIs

Some interesting changes and new features in RethinkDB 1.13 announced yesterday. Namely:

  • replacing the protocol buffers-based protocol for a JSON-protocol

    • how does the JSON protocol manage the non-JSON data types?
    • how fast is a text-based protocol?
  • notifications about document changes

    I’ve always said this was the coolest feature in CouchDB and that every database should support it.

  • a weird1 new http command to pull JSON data from the web

I’ve checked again the RethinkDB stability report and I’m not sure that reads as “yep, RethinkDB is finally production ready”.

  1. Knowing the team there, I’m pretty sure this is coming from a use case I’m not seeing. 

Original title and link: RethinkDB 1.13: new protocol and push-pull APIs (NoSQL database©myNoSQL)

RethinkDB raises an $8M Series A

Today we’re delighted to announce our Series A! We’ve raised $8M to fund development, grow the RethinkDB community, and ultimately make database tools feel indistinguishable from magic.

A long way to go, but the second step was made. Wholeheartedly congrats!

The HN thread.

Original title and link: RethinkDB raises an $8M Series A (NoSQL database©myNoSQL)


How Would We Query Such a Database Without Wasting Time With Ugly SQL?

How would we query such a database without wasting time with ugly SQL? We would need an API that will let us define our table schema and then allow us to craft queries using simple abstractions like collection maps, filter, joins, etc. I don’t mean a heavyweight ORM solution either. If we are after simplicity, we’d better forgo dealing with object mappings and the complexity they bring. All we want is a hassle-free way to model our data and read and write it.

After reading about this paragraph, I thought: “what a wonderful description of RethinkDB’s data querying language”. Then I switched back to reading the article which is about SQLAlchemy, one of the most interesting and complete ORMs.

Original title and link: How Would We Query Such a Database Without Wasting Time With Ugly SQL? (NoSQL database©myNoSQL)


A Key-Value Cache for Flash Storage: Facebook's McDipper and What Preceded It

A post on Facebook Engineering’s blog:

The outgrowth of this was McDipper, a highly performant flash-based cache server that is Memcache protocol compatible. The main design goals of McDipper are to make efficient use of flash storage (i.e. to deliver performance as close to that of the underlying device as possible) and to be a drop-in replacement for Memcached. McDipper has been in active use in production at Facebook for nearly a year.

I know at least 3 companies that have attacked this problem with different approaches and different results:

  1. Couchbase (ex-Membase, ex-NorthScale) started as a persistent clustered Memcached implementation. It was not optimized for Flash storage though. Today’s Couchbase product is still based on the memcache protocol, but it adding new features inspired by CouchDB.
  2. RethinkDB, a YC company and the company that I work for, has worked and released in 2011 a Memcache compatible storage engine optimized for SSDs. Since then, RethinkDB has been building and released an enhanced product, a distributed JSON store with advanced data manipulation support.
  3. Aerospike (ex Citrusleaf) sells a storage engine for flash drives. Its API is not Memcache compatible though.

People interested in this market segment have something to learn from this.

Original title and link: A Key-Value Cache for Flash Storage: Facebook’s McDipper and What Preceded It (NoSQL database©myNoSQL)


Using Hadoop Pig With MongoDB

In this post, we’ll see how to install MongoDB support for Pig and we’ll illustrate it with an example where we join 2 MongoDB collections with Pig and store the result in a new collection.

Color me very biased this time, but all these (especially the JOIN) can be done directly using RethinkDB.

Original title and link: Using Hadoop Pig With MongoDB (NoSQL database©myNoSQL)


NoSQL and JOINs: RavenDB and RethinkDB

Daniel Lang:

One of the main differences between relational databases and document databases is the lack of native joining capabilities, right? This is no longer true for RavenDB.

This wasn’t the case for RethinkDB1 which launched with support for JOINs. But it’s great to see others doing it too.

  1. First and last time disclaimer here: I work for RethinkDB.  

Original title and link: NoSQL and JOINs: RavenDB and RethinkDB (NoSQL database©myNoSQL)


RethinkDB Launches 1.0 Version With Memcached Compatibility Only

Just as I speculated , RethinkDB has finally launched the 1.0 version with Memcached compatibility only. Jason Kincaid (Techcrunch) writes:

RethinkDB has just launched its 1.0 release to the public, and it’s offering a product geared toward NoSQL installations — and it will work on SSDs, traditional drives, and cloud-based services like AWS. The startup has also moved away from MySQL and now fully supports Memcached.

But RethinkDB is not the first product providing a Memcached compatible (disk) persistent storage engine. One year ago Membase was launched promising not only a persistent Memcached compatible solution, but also elastic scalability.

RethinkDB has also published a performance report (PDF) demonstrating RethinkDB speed compared to Membase and MySQL. But if I’m reading those numbers correctly, while RethinkDB leads the majority of query-per-second (QPS) benchmarks, MySQL is consistently showing better latency numbers (which is kind of weird). For a strong durability scenario, the benchmark shows MySQL delivering 2x QPS compared to RethinkDB.

Another interesting aspect of the RethinkDB 1.0 release is the licensing model —which I don’t fully get:

RethinkDB Basic is currently identical in feature-set to RethinkDB Premium and Enterprise. However, the paid versions of RethinkDB include phone and email support, access to all future updates, and volume licensing options.

Or spelled out on the TechCrunch post :

Akhmechet says that the free version will get security updates, but that it won’t necessarily receive new features in the future, whereas the premium version will.

Original title and link: RethinkDB Launches 1.0 Version With Memcached Compatibility Only (NoSQL databases © myNoSQL)

RethinkDB: On TRIM, NCQ, and Write Amplification

Closing the circle:

RethinkDB gets around these issues in the following way. We identified over a dozen parameters that affect the performance of any given drive (for example, block size, stride, timing, etc.) We have a benchmarking engine that treats the underlying storage system as a black box and brute forces through many hundreds of permutations of these parameters to find an ideal workload for the underlying drive.

Original title and link: RethinkDB: On TRIM, NCQ, and Write Amplification (NoSQL databases © myNoSQL)


RethinkDB and SSD Write Performance

I didn’t know too much about RethinkDB until watching Tim Anglade’s interview with Slava Akhmechet and Mike Glukhovsky. There were mainly three things that caught my attention:

  1. RethinkDB is firstly building a persistent memcached compatible solution to work with SSD. The reason for starting with a memcached-compatible system is that building it is much simpler than implementing a MySQL storage engine. On the other hand I think that having a persistent memcached might bring RethinkDB some customers to validate the technology.

    Even if announced in 8-10 weeks at the time of the interview, I don’t think this implementation has been launched yet. Update: according to Tim, RethinkDB technology has been available to private beta users for a while now. But I still couldn’t find any reference to it on either the website or blog.

  2. Next will come a MySQL engine optimized for SSD

  3. Replacing rotational disks with SSD shows an immediate bump in performance. But shortly after (months) performance seriously degrades.

It is this last point that I haven’t heard before. And I’d really be interested to understand:

  • if it applies to all scenarios or if it is related to databases in general
  • are there specific database scenarios (access patterns, read/write ratios) that lead to this behavior or will it manifest in general cases too

My current assumption is that this behavior occurs for write intensive databases only. But I’d really like to hear some better documented answers.

Update: First answer I got to the above questions comes from Travis Truman: The SSD Anthology: Understanding SSDs and New Drives from OCZ.

Update: RethinkDB guys have published a follow up: On TRIM, NCQ, and write amplification.

Original title and link: RethinkDB and SSD Write Performance (NoSQL databases © myNoSQL)