NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



CouchDB: All content tagged as CouchDB in NoSQL databases and polyglot persistence

Conflict Resolution Using Rev Trees and a Comparison With Vector Clocks

Damien Katz has posted on GitHub a design document for the data structures, called rev trees, used to support conflict management in Couchbase. The doc also includes references to the way conflict resolution is done in CouchDB and also compares rev trees with the vector clocks.

When this happens [nb the edits are in conflict] Couchbase will store both edits, pick an interim winner (the same winner will be selected on all nodes) and “hide” the losing conflict(s) and mark the document as being in conflict so that it can found, using views and other searches, by an external agents who can potentially resolve the conflicts.

Original title and link: Conflict Resolution Using Rev Trees and a Comparison With Vector Clocks (NoSQL database©myNoSQL)


The State of CouchDB - Jan Lehnardt’s Comment

Jan Lehnardt posted a long reply to my comments on the State of CouchDB. I thought many would benefit from promoting it to a real post (with Jan’s permission). Before handing it over to Jan, I want to thank him for taking the time to clarify some of the things. I also want to be clear that I still stand by all my comments. Now, to Jan Lehnardt:

Hey Alex,

you are of course correct and I stand by my post. Let me explain the discrepancy.

The post is a summary of my notes for my opening talk of CouchDB Conf that I ran in Berlin in January. The target audience are the people in the actual audience. There are people who build CouchDB, people who help out with CouchDB, people who use CouchDB and a few people who want to know what’s up with CouchDB. By and large though, these are what I’d call “CouchDB People”.

You interpret the post as if it were for a general public audience and it is entirely my fault making that not more clear in the opening of my post.

To your notes:

  • Confusion: spot on, our bad, mistakes were made, it’s gonna take time to get sorted.

  • passive aggressive style: sorry if that read this way, it was definitely not intended. It was to highlight that there are people who absolutely love what CouchDB does. It isn’t a statement about quantities, which your note about “your numbers” implies. I’m not interested in discussing numbers, but I understand that people have turned away for good reasons. — Consider being an enthusiast, and you go to a conference of like-minded people and the project lead gives a talk and says, “people like you are passionate”. “Fuck yeah I am passionate” you think, or say, and get a good vibe going at the conference. (For CouchDB fans, it was really good times).

  • list of features: good stuff on there, but none of that matters until it ships. This is for people on the inside to see what we are working towards and get them rallied up to help and contribute. You assessment that the “real” features are in the gist is misguided, but I chalk that up to differing opinions, no harm done.

  • *ouch projects: hell yeah I am excited to finally fulfil the original promise of CouchDB from fucking six years ago. The thing to highlight here isn’t that “boo a bunch of things that rhyme with ouch”, but that we are starting to see a production-ready ecosystem of a true open source data sync solution that bloody works. — I agree with you that branding/communication is key here, and that there is a lot to be done.

  • facts matter / “came out on top: nope”: again spot on, but this is where I really wished you had given me shout before posting this. Maybe next time, you should still have my number. Your assessment that we did not come out on top is completely correct, if you look at it from a general public point of view. It’d be odd to deny the facts. But again, this isn’t meant as a post that says “hey everyone, look how great CouchDB is doing”, because a) CouchDB isn’t and b) the intended audience is not everybody. What I did meant to point out that the open source project is aware of the challenges it is facing and is doing its utmost to set everything up so things can be resolved. We spent twelve months in relative obscurity preparing many things that are starting to see the light of day now, but most of it in the future. Only when we delivered on all of that, we can look at the facts again and see how CouchDB is doing. I am confident that it’ll look good, but there is a lot to be done until then. The “coming out on top” is a comment on that the core of the project and its community are strong, and that we are in a position to turn the boat back to former and further glory and not that we are somehow deceived by our own filter bubble and believe that all is well when it isn’t.

Thanks for giving me the opportunity to explain things. I think your assessment of CouchDB in general of the past years has been spot on and your current criticisms are also accurate and well received, it’s just that you got the intended audience for my post wrong, which I didn’t make very clear to the casual observer.

Original title and link: The State of CouchDB - Jan Lehnardt’s Comment (NoSQL database©myNoSQL)

The State of CouchDB With Comments

With the whole confusion surrounding and the lack of energy in the Couch[addsuffixhere] world, my attention has slowly shifted away. Thus, it was only last night that I’ve read Jan Lehnardt’s “State of CouchDB” post.

Couple of notes:

  • Jan Lehnardt is out of Couchbase (as an employee) and plans to focus again (a part of his time) on Apache CouchDB
  • he’s the first person directly related to CouchDB that finally accepts publicly the whole confusion created around CouchDB and the companies connected in some way or another to it. I was really, really tired repeating this for the last 2 years.
  • “people who really get to know CouchDB are extremely passionate about it” — I read this as passive aggressive style. According to my data the people interested in CouchDB were fewer and fewer by the day.
  • there are plans for what’s coming next in CouchDB. The post gives a short list of 4 things, but the real ones are in the gist I’ve linked to earlier
    • BigCouch merging has been mentioned for so long that right now it feels like “waiting for the unicorn”
  • Jan Lehnardt mentions (and is excited) about the alphabet of _ouchDB projects. I’ll say it one more time: they’re probably cool, but long term they’ll perpetuate the confusion. Unfortunately there’s nothing much to be done now.
  • I’m glad to read a state of a union from a person that has been involved for so long with CouchDB. But in the world of open source, it’s only the facts that matter. Sometimes reviving a project or regaining users is more difficult than starting from scratch.
  • “We had a hard year, lost our traction, and we still came out on top.” Nope.

Original title and link: The State of CouchDB With Comments (NoSQL database©myNoSQL)


CouchDB Future Feature List

I’m saving the current list of CouchDB Future Features so I can check it back in 6 months1:

  1. I am not saying this as a mild form of “filing for future claim chowder”. 

Original title and link: CouchDB Future Feature List (NoSQL database©myNoSQL)

CouchDB In-Browser JavaScript Debugger

Interesting project and if it works it can prove to be very useful considering mostly everything in CouchDB is JavaScript, at least from a developer’s perspective:

Original title and link: CouchDB In-Browser JavaScript Debugger (NoSQL database©myNoSQL)


Couchjs: Drop-In Replacement Javascript V8 Engine for Apache CouchDB

By the Iris Couch guys that are also providing Apache CouchDB cloud hosting:

couchjs is a command-line Node.js program. It is 100% compatible with Apache CouchDB’s built-in JavaScript system.

By using couchjs, you will get 100% CouchDB compatibility (the test suite completely passes) but your JavaScript environment is V8, or Node.js.

Original title and link: Couchjs: Drop-In Replacement Javascript V8 Engine for Apache CouchDB (NoSQL database©myNoSQL)


11 Interesting Releases From the First Weeks of January

The list of releases I wanted to post about has been growing fast these last couple of weeks, so instead of waiting leaving it to Here it is (in no particular order1):

  1. (Jan.2nd) Cassandra 1.2 — announcement on DataStax’s blog. I’m currently learning and working on a post looking at what’s new in Cassandra 1.2.
  2. (Jan.10th) Apache Pig 0.10.1 — Hortonworks wrote about it
  3. (Jan.10th) DataStax Community Edition 1.2 and OpsCenter 2.1.3 — DataStax announcement
  4. (Jan.10th) CouchDB 1.0.4, 1.1.2, and 1.2.1 — releases fixing some security vulnerabilities
  5. (Jan.11th) MongoDB 2.3.2 unstable — announcement. This dev release includes support for full text indexing. For more details you can check:

    […] an open source project extending Hadoop and Hive with a collection of useful user-defined-functions. Its aim is to make the Hive Big Data developer more productive, and to enable scalable and robust dataflows.

  1. I’ve tried to order it chronologically, but most probably I’ve failed. 

Original title and link: 11 Interesting Releases From the First Weeks of January (NoSQL database©myNoSQL)

CouchDB, TouchDB, PouchDB…

Calvin Metcalf writes about PouchDB, which is neither TouchDB nor CouchDB:

Before we discus PouchDB we’re going to need to talk about CouchDB which Pouch is based on. […] So one of the issues with CouchDB is that Erlang…well lets just say people have mixed feelings about it, which lead to pretty quickly, CouchDB compatible Databases, Big Couch from Cloudant which you can cluster, TouchDB is a version written in Objective-C targeting embedded apps, and then we have PouchDB.

Hurry as you may run out of names: AouchDB, BouchDB, DouchDB, EouchDB, FouchDB, GouchDB, HouchDB, IouchDB, JouchDB, KouchDB, LouchDB, MouchDB, NouchDB, QouchDB, RouchDB, SouchDB, UouchDB, VouchDB, WouchDB, XouchDB, YouchDB, ZouchDB. For special requests we could expand to using unicode and emoji.

Original title and link: CouchDB, TouchDB, PouchDB… (NoSQL database©myNoSQL)

The Three Ways to Remove a Document From CouchDB and Their Usages

Nathan Vander Wilt:

The choice depends (mostly) on how you’re syncing between databases:

  • With filtered replication, you might want to add _deleted:true alongside the original document data
  • For normal/plain/unfiltered replication, you can simply DELETE
  • If you are NOT replicating, _purge has its uses

I only knew about the straightforward DELETE approach. But I’m learning that it is just a special case of marking a document as deleted. While the post looks at these operations from the point of view of CouchDB’s masterless replication, their behavior can also be connected to the school of soft deletes or Pat Helland’s non updatable data:

In large-scale systems, you don’t update data, you add new data or create a new version.

Original title and link: The Three Ways to Remove a Document From CouchDB and Their Usages (NoSQL database©myNoSQL)


From S3 to CouchDB and Redis and Then Half Way Back for Serving Ads

The story of going form S3 to CouchDB and Redis and then back to S3 and Redis for ad serving:

The solution to this situation has a touch of irony. With Redis in place, we replaced CouchDB for placement- and ad-data with S3. Since we weren’t using any CouchDB-specific features, we simply published all the documents to S3 buckets instead. We still did the Redis cache warming upfront and data updates in the background. So by decoupling the application from the persistence layer using Redis, we also removed the need for a super fast database backend. We didn’t care that S3 is slower than a local CouchDB, since we updated everything asynchronously.

Besides the detailed blog post there’s also a slidedeck:

Original title and link: From S3 to CouchDB and Redis and Then Half Way Back for Serving Ads (NoSQL database©myNoSQL)


NoSQL Releases and Announcements

Catching up after almost two weeks offline is no easy task, but I hope I’ll not miss any important events, releases, or posts. But if I do, please email me.

Cassandra 1.0.9: Maintenance Release

The complete change notes for Cassandra 1.0.9 are here:

  • improve index sampling performance (CASSANDRA-4023)
  • always compact away deleted hints immediately after handoff (CASSANDRA-3955)
  • delete hints from dropped ColumnFamilies on handoff instead of erroring out (CASSANDRA-3975)
  • add CompositeType ref to the CLI doc for create/update column family (CASSANDRA-3980)
  • Avoid NPE during repair when a keyspace has no CFs (CASSANDRA-3988)
  • Fix division-by-zero error on get_slice (CASSANDRA-4000)
  • don’t change manifest level for cleanup, scrub, and upgradesstables operations under LeveledCompactionStrategy (CASSANDRA-3989, 4112)
  • fix race leading to super columns assertion failure (CASSANDRA-3957)
  • ensure that directory is selected for compaction for user-defined tasks and upgradesstables (CASSANDRA-3985)
  • allow custom types in CLI’s assume command (CASSANDRA-4081)
  • fix totalBytes count for parallel compactions (CASSANDRA-3758)
  • fix intermittent NPE in get_slice (CASSANDRA-4095)
  • remove unnecessary asserts in native code interfaces (CASSANDRA-4096)
  • Fix EC2 snitch incorrectly reporting region (CASSANDRA-4026)
  • Shut down thrift during decommission (CASSANDRA-4086)
  • Merged from 0.8: Fix ConcurrentModificationException in gossiper (CASSANDRA-4019)

  • Pig

    • support Counter ColumnFamilies (CASSANDRA-3973)
    • Composite column support (CASSANDRA-3684)
  • CQL

    • fix NPE on invalid CQL delete command (CASSANDRA-3755)
    • Validate blank keys in CQL to avoid assertion errors (CASSANDRA-3612)

Apache Hadoop User Impersonation vulnerability

This vulnerability discovered by Cloudera’s Aaron T. Myers affects Hadoop’s versions,,, 1.0.0 to 1.0.1, and 0.23.0 to 0.23.1 where Kerberos is enabled. Complete details available here.

CouchDB 1.2.0

This is the first important release after the start of the year CouchDB hubbub with Damien Katz and Couchbase. The new version is a major release in itself deserving its own post: CouchDB 1.2.0: Performance, Security, API, Core and Replication Improvements.

Riak 1.1.2: Stabilization release

Just a maintenance release in the Riak 1.1 series. Complete release notes here.

Original title and link: NoSQL Releases and Announcements (NoSQL database©myNoSQL)

CouchDB 1.2.0: Performance, Security, API, Core and Replication Improvements

CouchDB 1.2.0 was released on April 6th. The linked post provides all the details of the new version, but here are some important improvements included with the new release:

  • Performance: added a native JSON parser
  • Performance: optional file compression for database and view index files
  • Performance: a new replicator implementation. More reliable, faster, configurable.
  • Security: the _users database and information in the _replication databases are not longer readable by everyone
  • Core: added support for automatic compaction. Automatic compaction is off by default, but can be enabed through Futon or the .ini file and configured to run based on multiple variables:
    • A threshold for the file_size to disk_size ratio (say 70%)
    • A time window specified in hours and minutes (e.g 01:00-05:00)
    • Compaction can be cancelled if it exceeds the closing time.
    • Compaction for views and databases can be set to run in parallel
    • If there’s not enough space (2 × data_size) on the disk to complete a compaction, an error is logged and the compaction is not started.

Original title and link: CouchDB 1.2.0: Performance, Security, API, Core and Replication Improvements (NoSQL database©myNoSQL)