NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



CouchDB: All content tagged as CouchDB in NoSQL databases and polyglot persistence

The State of CouchDB With Comments

With the whole confusion surrounding and the lack of energy in the Couch[addsuffixhere] world, my attention has slowly shifted away. Thus, it was only last night that I’ve read Jan Lehnardt’s “State of CouchDB” post.

Couple of notes:

  • Jan Lehnardt is out of Couchbase (as an employee) and plans to focus again (a part of his time) on Apache CouchDB
  • he’s the first person directly related to CouchDB that finally accepts publicly the whole confusion created around CouchDB and the companies connected in some way or another to it. I was really, really tired repeating this for the last 2 years.
  • “people who really get to know CouchDB are extremely passionate about it” — I read this as passive aggressive style. According to my data the people interested in CouchDB were fewer and fewer by the day.
  • there are plans for what’s coming next in CouchDB. The post gives a short list of 4 things, but the real ones are in the gist I’ve linked to earlier
    • BigCouch merging has been mentioned for so long that right now it feels like “waiting for the unicorn”
  • Jan Lehnardt mentions (and is excited) about the alphabet of _ouchDB projects. I’ll say it one more time: they’re probably cool, but long term they’ll perpetuate the confusion. Unfortunately there’s nothing much to be done now.
  • I’m glad to read a state of a union from a person that has been involved for so long with CouchDB. But in the world of open source, it’s only the facts that matter. Sometimes reviving a project or regaining users is more difficult than starting from scratch.
  • “We had a hard year, lost our traction, and we still came out on top.” Nope.

Original title and link: The State of CouchDB With Comments (NoSQL database©myNoSQL)


CouchDB Future Feature List

I’m saving the current list of CouchDB Future Features so I can check it back in 6 months1:

  1. I am not saying this as a mild form of “filing for future claim chowder”. 

Original title and link: CouchDB Future Feature List (NoSQL database©myNoSQL)

CouchDB In-Browser JavaScript Debugger

Interesting project and if it works it can prove to be very useful considering mostly everything in CouchDB is JavaScript, at least from a developer’s perspective:

Original title and link: CouchDB In-Browser JavaScript Debugger (NoSQL database©myNoSQL)


Couchjs: Drop-In Replacement Javascript V8 Engine for Apache CouchDB

By the Iris Couch guys that are also providing Apache CouchDB cloud hosting:

couchjs is a command-line Node.js program. It is 100% compatible with Apache CouchDB’s built-in JavaScript system.

By using couchjs, you will get 100% CouchDB compatibility (the test suite completely passes) but your JavaScript environment is V8, or Node.js.

Original title and link: Couchjs: Drop-In Replacement Javascript V8 Engine for Apache CouchDB (NoSQL database©myNoSQL)


11 Interesting Releases From the First Weeks of January

The list of releases I wanted to post about has been growing fast these last couple of weeks, so instead of waiting leaving it to Here it is (in no particular order1):

  1. (Jan.2nd) Cassandra 1.2 — announcement on DataStax’s blog. I’m currently learning and working on a post looking at what’s new in Cassandra 1.2.
  2. (Jan.10th) Apache Pig 0.10.1 — Hortonworks wrote about it
  3. (Jan.10th) DataStax Community Edition 1.2 and OpsCenter 2.1.3 — DataStax announcement
  4. (Jan.10th) CouchDB 1.0.4, 1.1.2, and 1.2.1 — releases fixing some security vulnerabilities
  5. (Jan.11th) MongoDB 2.3.2 unstable — announcement. This dev release includes support for full text indexing. For more details you can check:

    […] an open source project extending Hadoop and Hive with a collection of useful user-defined-functions. Its aim is to make the Hive Big Data developer more productive, and to enable scalable and robust dataflows.

  1. I’ve tried to order it chronologically, but most probably I’ve failed. 

Original title and link: 11 Interesting Releases From the First Weeks of January (NoSQL database©myNoSQL)

CouchDB, TouchDB, PouchDB…

Calvin Metcalf writes about PouchDB, which is neither TouchDB nor CouchDB:

Before we discus PouchDB we’re going to need to talk about CouchDB which Pouch is based on. […] So one of the issues with CouchDB is that Erlang…well lets just say people have mixed feelings about it, which lead to pretty quickly, CouchDB compatible Databases, Big Couch from Cloudant which you can cluster, TouchDB is a version written in Objective-C targeting embedded apps, and then we have PouchDB.

Hurry as you may run out of names: AouchDB, BouchDB, DouchDB, EouchDB, FouchDB, GouchDB, HouchDB, IouchDB, JouchDB, KouchDB, LouchDB, MouchDB, NouchDB, QouchDB, RouchDB, SouchDB, UouchDB, VouchDB, WouchDB, XouchDB, YouchDB, ZouchDB. For special requests we could expand to using unicode and emoji.

Original title and link: CouchDB, TouchDB, PouchDB… (NoSQL database©myNoSQL)

The Three Ways to Remove a Document From CouchDB and Their Usages

Nathan Vander Wilt:

The choice depends (mostly) on how you’re syncing between databases:

  • With filtered replication, you might want to add _deleted:true alongside the original document data
  • For normal/plain/unfiltered replication, you can simply DELETE
  • If you are NOT replicating, _purge has its uses

I only knew about the straightforward DELETE approach. But I’m learning that it is just a special case of marking a document as deleted. While the post looks at these operations from the point of view of CouchDB’s masterless replication, their behavior can also be connected to the school of soft deletes or Pat Helland’s non updatable data:

In large-scale systems, you don’t update data, you add new data or create a new version.

Original title and link: The Three Ways to Remove a Document From CouchDB and Their Usages (NoSQL database©myNoSQL)


From S3 to CouchDB and Redis and Then Half Way Back for Serving Ads

The story of going form S3 to CouchDB and Redis and then back to S3 and Redis for ad serving:

The solution to this situation has a touch of irony. With Redis in place, we replaced CouchDB for placement- and ad-data with S3. Since we weren’t using any CouchDB-specific features, we simply published all the documents to S3 buckets instead. We still did the Redis cache warming upfront and data updates in the background. So by decoupling the application from the persistence layer using Redis, we also removed the need for a super fast database backend. We didn’t care that S3 is slower than a local CouchDB, since we updated everything asynchronously.

Besides the detailed blog post there’s also a slidedeck:

Original title and link: From S3 to CouchDB and Redis and Then Half Way Back for Serving Ads (NoSQL database©myNoSQL)


NoSQL Releases and Announcements

Catching up after almost two weeks offline is no easy task, but I hope I’ll not miss any important events, releases, or posts. But if I do, please email me.

Cassandra 1.0.9: Maintenance Release

The complete change notes for Cassandra 1.0.9 are here:

  • improve index sampling performance (CASSANDRA-4023)
  • always compact away deleted hints immediately after handoff (CASSANDRA-3955)
  • delete hints from dropped ColumnFamilies on handoff instead of erroring out (CASSANDRA-3975)
  • add CompositeType ref to the CLI doc for create/update column family (CASSANDRA-3980)
  • Avoid NPE during repair when a keyspace has no CFs (CASSANDRA-3988)
  • Fix division-by-zero error on get_slice (CASSANDRA-4000)
  • don’t change manifest level for cleanup, scrub, and upgradesstables operations under LeveledCompactionStrategy (CASSANDRA-3989, 4112)
  • fix race leading to super columns assertion failure (CASSANDRA-3957)
  • ensure that directory is selected for compaction for user-defined tasks and upgradesstables (CASSANDRA-3985)
  • allow custom types in CLI’s assume command (CASSANDRA-4081)
  • fix totalBytes count for parallel compactions (CASSANDRA-3758)
  • fix intermittent NPE in get_slice (CASSANDRA-4095)
  • remove unnecessary asserts in native code interfaces (CASSANDRA-4096)
  • Fix EC2 snitch incorrectly reporting region (CASSANDRA-4026)
  • Shut down thrift during decommission (CASSANDRA-4086)
  • Merged from 0.8: Fix ConcurrentModificationException in gossiper (CASSANDRA-4019)

  • Pig

    • support Counter ColumnFamilies (CASSANDRA-3973)
    • Composite column support (CASSANDRA-3684)
  • CQL

    • fix NPE on invalid CQL delete command (CASSANDRA-3755)
    • Validate blank keys in CQL to avoid assertion errors (CASSANDRA-3612)

Apache Hadoop User Impersonation vulnerability

This vulnerability discovered by Cloudera’s Aaron T. Myers affects Hadoop’s versions,,, 1.0.0 to 1.0.1, and 0.23.0 to 0.23.1 where Kerberos is enabled. Complete details available here.

CouchDB 1.2.0

This is the first important release after the start of the year CouchDB hubbub with Damien Katz and Couchbase. The new version is a major release in itself deserving its own post: CouchDB 1.2.0: Performance, Security, API, Core and Replication Improvements.

Riak 1.1.2: Stabilization release

Just a maintenance release in the Riak 1.1 series. Complete release notes here.

Original title and link: NoSQL Releases and Announcements (NoSQL database©myNoSQL)

CouchDB 1.2.0: Performance, Security, API, Core and Replication Improvements

CouchDB 1.2.0 was released on April 6th. The linked post provides all the details of the new version, but here are some important improvements included with the new release:

  • Performance: added a native JSON parser
  • Performance: optional file compression for database and view index files
  • Performance: a new replicator implementation. More reliable, faster, configurable.
  • Security: the _users database and information in the _replication databases are not longer readable by everyone
  • Core: added support for automatic compaction. Automatic compaction is off by default, but can be enabed through Futon or the .ini file and configured to run based on multiple variables:
    • A threshold for the file_size to disk_size ratio (say 70%)
    • A time window specified in hours and minutes (e.g 01:00-05:00)
    • Compaction can be cancelled if it exceeds the closing time.
    • Compaction for views and databases can be set to run in parallel
    • If there’s not enough space (2 × data_size) on the disk to complete a compaction, an error is logged and the compaction is not started.

Original title and link: CouchDB 1.2.0: Performance, Security, API, Core and Replication Improvements (NoSQL database©myNoSQL)


Here Is Why in Cassandra vs. HBase, Riak, CouchDB, MongoDB, It's Cassandra FTW

Brian ONeill:

Now, since choosing Cassandra, I can say there are a few other really important less tangible considerations. The first, is the code base. Cassandra has an extremely clean and well maintained code base. Jonathan and team do a fantastic job managing the community and the code. As we adopted NoSQL, the ability to extend the code-base and incorporate our own features has proven invaluable. (e.g. triggers, a REST interface, and server-side wide-row indexing)

Secondly, the community is phenomenal. That results in timely support, and solid releases on a regular schedule. They do a great job prioritizing features, accepting contributions, and cranking out features. (They are now releasing ~quarterly) We’ve all probably been part of other open source projects where the leadership is lacking, and features and releases are unpredictable, which makes your own release planning difficult. Kudos to the Cassandra team.

Everything sounds reasonable except for Riak being the “new kid on the block” and not finding support for it. Basho, where were you hidding?

Original title and link: Here Is Why in Cassandra vs. HBase, Riak, CouchDB, MongoDB, It’s Cassandra FTW (NoSQL database©myNoSQL)


NoSQL Databases Adoption in Numbers

Source of data is Jaspersoft NoSQL connectors downloads. RedMonk published a graphic and an analysis and Klint Finley followed up with job trends:

NoSQL databases adoption

Couple of things I don’t see mentioned in the RedMonk post:

  1. if and how data has been normalized based on each connector availability

    According to the post data has been collected between Jan.2011-Mar.2012 and I think that not all connectors have been available since the beginning of the period.

  2. if and how marketing pushes for each connectors have been weighed in

    Announcing the Hadoop connector at an event with 2000 attendees or the MongoDB connector at an event with 800 attendeed could definitely influence the results (nb: keep in mind that the largest number is less than 7000, thus 200-500 downloads triggered by such an event have a significant impact)

  3. Redis and VoltDB are mostly OLTP only databases

Original title and link: NoSQL Databases Adoption in Numbers (NoSQL database©myNoSQL)