NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



BigCouch: All content tagged as BigCouch in NoSQL databases and polyglot persistence

Welcome BigCouch to CouchDB

Wait! BigCouch was actually merged in CouchDB:

What does this mean? Well, right now, the code is merged, but not released. So hold your clicks just a moment! Once the code has been tested, we will include it in one of our regular releases.

Original title and link: Welcome BigCouch to CouchDB (NoSQL database©myNoSQL)


Cloudant's BigCouch and Apache CouchDB... the merge that took a while

The two merged thousands of lines of Erlang to update Apache CouchDB with the modifications Cloudant has made to its core database software. These changes lay the groundwork for preparing the Apache community to improve CouchDB performance at large scale.

I don’t remember when was the first time I’ve heard about BigCouch being contributed to the Apache CouchDB project. I do remember though that, at that time, I actually believed it, as it made sense: Cloudant was still in its early days, seeking validation of its solution, and CouchDB was at its peak.

It’s been so long that I totally forgot about it. But now I’m starting to believe it again. Just as much as a GitHub branch.

Original title and link: Cloudant’s BigCouch and Apache CouchDB… the merge that took a while (NoSQL database©myNoSQL)


Dealing With JVM Limitations in Apache Cassandra

A couple of most notable NoSQL databases targeting large scalable systems are written in Java: Cassandra, HBase, BigCouch. Then there’s also Hadoop. Plus a series of caching and data grid solutions like Terracotta, Gigaspaces. They are all facing the same challenge: tuning the JVM garbage collector for predictable latency and throughput.

Jonathan Ellis’s slides presented at Fosdem 2012 are covering some of the problems with GC and the way Cassandra tackles them. While this is one of those presentations where the slides are not enough to understand the full picture, going through them will still give you a couple of good hints.

For those saying that Java and the JVM are not the platform for writing large concurrent systems, here’s the quote Ellis is finishing his slides with:

Cliff Click: Many concurrent algorithms are very easy to write with a GC and totally hard (to down right impossible) using explicit free.

Enjoy the slides after the break.

Trying out BigCouch with Chef-Solo and Vagrant

So the other day, I wanted to quickly check something in BigCouch and thanks to Vagrant, chef(-solo) and a couple cookbooks — courtesy of Cloudant — this was exceptionally easy.

I’ve asked myself many times what is the easiest way to experiment with all these NoSQL databases and frequently changing versions. So far my “recipe” running on Mac OS has been homebrew. But this combination of automated virtual machines sounds quite compelling. Any other suggestions? Should I prefer Puppet to Chef?

Original title and link: Trying out BigCouch with Chef-Solo and Vagrant (NoSQL databases © myNoSQL)


Riak Backend Based on CouchDB B-Tree

The CouchDB with a Riak backend didn’t get too far and since then we’ve got BigCouch. But the opposite combination seems to be interesting too.

Kresten Krab Thorup released a backend for Riak based on couch_btree riak_btree_backend.

When compared with LSM trees and Fractal trees, B+Trees do not show the highest write performance. And recently the Acunu research team has published a paper Stratified B-trees and versioning dictionaries about a new data structure, the “stratified B-tree“:

A classic versioned data structure in storage and computer science is the copy-on-write (CoW) B-tree — it underlies many of today’s file systems and databases, including WAFL, ZFS, Btrfs and more. Unfortunately, it doesn’t inherit the B-tree’s optimality properties; it has poor space utilization, cannot offer fast updates, and relies on random IO to scale. Yet, nothing better has been developed since. We describe the `stratified B-tree’, which beats all known semi-external memory versioned B-trees, including the CoW B-tree. In particular, it is the first versioned dictionary to achieve optimal tradeoffs between space, query and update performance.

With its pluggable storage backend (InnoDB, Bitcask, couch_btree, etc.), Riak might provide at some point a “stratified B-tree”implementation too.

Update: Here’s the Hacker News discussion about the “Stratified B-trees and versioning dictionaries” paper.

Original title and link: Riak Backend Based on CouchDB B-Tree (NoSQL databases © myNoSQL)

BigCouch Case Study: Research of Radition in Seattle

Cloudant’s BigCouch database let the team keep up with a steady flow of data so it could process and analyze it, then share it with the various stakeholders in near-real-time. The team was changing the data about 20 times per day and writing complex workflows to process it, two tasks that fall into BigCouch’s wheelhouse. The database has a built-in MapReduce engine to enable writing and processing the workflows, and it allows for secondary indices, which users can populate with new data from their MapReduce jobs and query very quickly.

This is the first case study I’m reading about BigCouch. But keep in mind that the project initiator is also the founder of Cloudant the company that created and open sourced BigCouch

Original title and link: BigCouch Case Study: Research of Radition in Seattle (NoSQL databases © myNoSQL)


Cloudant about Couchbase Announcement

Alan Hoffman of Cloudant, the CouchDB hosting providers and creators of the BigCouch scalable CouchDB solution:

I do want to take issue with one thing said in the press release for Couchbase. They say: “Couchbase becomes the only document database capable of safely storing your data whether stored on a single server, or spread across hundreds.”

Some of our customers have billions of documents stored safely on dozens of nodes in datacenters around the world. It’s too soon to say what Couchbase will become, but if you need a safe, scalable, and easy-to-use document storage platform, our technology already provides that today

I think I’ve heard something similar before.

As a side note, I don’t know if it’s only me, but I always think that PR announcements (nb: I’m referring to Couchbase’s PR formulation) make more bad than good. I don’t have an issue with a company stating they want to create the best product that features X and Y and Z. As a possible client, I couldn’t care less if the product is the first, the last, or the only. I only care about those features that really make it useful to my problems.

Original title and link: Cloudant about Couchbase Announcement (NoSQL databases © myNoSQL)


Scaling Out CouchDB with BigCouch

While CouchOne is focused on getting CouchDB on the mobilesCouchDB is available on Android and probably coming to iOS, Cloudant, the other CouchDB oriented company, is focused on CouchDB horizontal scalability by providing as open source and hosting BigCouch.

Recently Cloudant hosted a webinar on scaling out CouchDB with BigCouch. You can watch the video and slides embedded below:

In a future post I’ll cover more details about how BigCouch is scaling CouchDB.

Original title and link: Scaling Out CouchDB with BigCouch (NoSQL databases © myNoSQL)

Node.js + Redis + CouchApp + BigCouch + CouchDB + TurnkeyLinux = Nirvana?

On Hacker News:

The reason for doing things this way is to make continuous deployment easier, but also to never have to reboot or change e servers much. The node.js services will be well tested and unchanging.


This should reduce maintenance to the bare minimum,and I can push out a new set of appliances over time and slowly migrate trafic to them once or twice a year when I need to update core funcationality.

Sounds like a cool combination, but I doubt that maintaining 4 different pieces will end up being as easy as he thinks it will.

Original title and link: Node.js + Redis + CouchApp + BigCouch + CouchDB + TurnkeyLinux = Nirvana? (NoSQL databases © myNoSQL)


BigCouch: Java Map-Reduce with CouchDB

Feels like a conspiracy to have the 3rd Java related post today, but the one from Cloudant is quite big:

Today we are releasing the Java Language Map-Reduce View Interface for Cloudant’s Hosted CouchDB service. This interface defines the protocol for writing Map-Reduce views in Java that can be run on our hosted CouchDB platform. […] The Java view server works differently than a standard CouchDB view server. The design document does not contain code. Instead, the design document specifies which class should be called for the Map and Reduce steps. The code (a jar) is attached to the design document in the form of a binary attachment. This jar contains both user defined classes and external libraries that are needed. This paradigm (libraries as binary attachments) is a non-standard extension of the CouchDB view server API.

Bringing the most popular VM and all the languages supported on it to CouchDB is definitely a very smart move.

Original title and link for this post: BigCouch: Java Map-Reduce with CouchDB (published on the NoSQL blog: myNoSQL)


CouchDB BigCouch: Cloudant Open Sourcing their CouchDB Scaling Project

I’ve covered the Cloudant solution for CouchDB horizontal scalability and mentioned that’s probably the most interesting one for scaling CouchDB. Now Cloudant has open sourced it under the name BigCouch and the code is available on ☞ GitHub.

What does it do? Think of BigCouch as a set of Erlang/OTP applications that allow you to create a cluster of CouchDBs that is distributed across many nodes/servers. Instead of one big honking CouchDB, the result is an elastic data store which is fully CouchDB API-compliant.


The clustering layer is most closely modeled after Amazon’s Dynamo, with consistent hashing, replication, and quorum for read/write operations. CouchDB view indexing occurs in parallel on each partition, and can achieve impressive speedups as compared to standalone serial indexing.

Let’s see if BigCouch is better than CouchDB with a Riak backend.

Original title and link for this post: CouchDB BigCouch: Cloudant Open Sourcing their CouchDB Scaling Project (published on the NoSQL blog: myNoSQL)


CouchDB: Horizontal Scalability from Cloudant

Even if CouchDB benefits of probably one of the most sophisticated and cool replication mechanisms that doesn’t make it horizontally scalable. I’ve already covered the different solutions for scaling CouchDB, but what Cloudant promises seems to be the missing part:

All of these features — distributed, horizontally scalable, durable, consistent — happen with little or no change required in applications that have been written for CouchDB. A cluster looks just like a stand-alone CouchDB, and API compliance has been our goal from the beginning. Granted, there are a few extra options like overriding quorum constant defaults and there are a few vagaries, like views always performing rereduce due to the views being distributed. But on the whole, the extras in Cloudant are transparent to the application.

Now I’m wondering how Cloudant CouchDB scaling compares with running CouchDB with a Riak backend, Riak offering also a Dynamo-like distributed system.

CouchDB: Horizontal Scalability from Cloudant originally posted on the NoSQL blog: myNoSQL