NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



couchdb case study: All content tagged as couchdb case study in NoSQL databases and polyglot persistence

What real uses I could use CouchDB for? What can I use it for?

A must read:

So I’ve been obsessively reading about and researching CouchDB over the past couple weeks. I even wrote my own Java client since the ones on the market weren’t up to my standards :) I’ve probably read 200 articles on google explaining the downsides to CouchDB. I’ve read the Use Cases on I’ve read Jan’s book, 10 times. And still I have this one overwhelming question - what can I use it for???

I’m a java developer. I work in a large enterprise but I also do lots of home projects. At work we use Oracle with Hibernate for Java ORM. At home I use MySQL. For attachments I use a CDN. Can anyone explain what real uses I could use CouchDB for?

I’ve already said it a couple of times that CouchDB message/positioning is confusing and that current CouchDB case studies aren’t too enlightening either, so hopefully Andy (the passionate guy above) will get some better answers.

Original title and link: What real uses I could use CouchDB for? What can I use it for? (NoSQL databases © myNoSQL)


CouchDB Case Study: Ataxo - Social Media Tracker

New CouchDB case study from Ataxo Social Insider social network monitoring app:

Because CouchDB is fully based on HTTP, they were also able to scale the data store from the start. As Rails developers, they live and breathe HTTP, using ETags and Expires all the time. They knew how to scale HTTP stacks. So they knew they could easily split the server read load just by tying separated CouchDB instances with well understood tools like Nginx or Haproxy and starting replication between them. They could use a dedicated cache such as Squid or Varnish to alleviate the read load in a snap. Most importantly, with a system based on HTTP, every part of the stack is transparent, easy to monitor, enable, disable or multiply.

Actually based on the article, my impression was that they were able to “scale from the start” as they figured out a smart way to split data per customer (update: check Karel’s comment for a clarification on this). Plus using Nginx or Haproxy, Squid or Varnish, doesn’t sound a lot easier than using Memcached + MySQL, or even Membase alone.

Original title and link: CouchDB Case Study: Ataxo - Social Media Tracker (NoSQL databases © myNoSQL)


CouchDB Case Study: SkinnyBoard

From another CouchDB usage story:

One of the things that attracted the SkinnyBoard team to CouchDB is the document-based store with its inherent flexibility. It allows them to store entities and their relationships in a single document without the need for complex lookups and joins required by traditional relational databases. CouchDB’s support for map/reduce also means that they can construct complex queries and store them as design documents externally without the need for having this business logic in their main application.

While using the document model sounds like the right choice from the scenario, I’m confused by the map/reduce related comment which sounds more like stored procedures than normal app queries.

Original title and link: CouchDB Case Study: SkinnyBoard (NoSQL databases © myNoSQL)


CouchDB Case Study: Dimagi - CouchDB Replication at Work

From the Dimagi CouchDB success story:

Dimagi became interested in CouchDB after learning about its replication technology. They needed a fully off-line system for each of their clinics, because the only network connection any of them had was an unreliable GPRS modem on the local cellular network. Network outages and latency could not be allowed to disrupt clinic operations.

By standing up a lightweight server at each clinic, backed by a CouchDB datastore, BHOMA was able to ensure constant uptime in the clinics – providing power was up. Each clinic replicates over the modem’s connection to their national CouchDB database. Because of CouchDB’s continuous replication and optimized synchronization, Dimagi didn’t have to worry about writing complicated sync protocols. Filtered replication allowed them to send only the appropriate data to each clinic, drastically reducing the bandwidth required to sync with the central server. The two-way replication also allowed for data collected on CHWs’ cell phones to propagate back to the clinic, for timely patient updates.

Original title and link: CouchDB Case Study: Dimagi - CouchDB Replication at Work (NoSQL databases © myNoSQL)


CouchDB Case Study: Poyomi's architecture where .NET meets NoSQL and AMQP

Poyomi, a photobook creation and printing service, story of using CouchDB:

Photo storage duties are handled by CouchDB, a NoSQL Key-Value database with a HTTP REST interface. Everything is stored as a JSON document, with one very useful feature that we exploit fully: binary attachments. Each photo is stored as a single document with multiple image attachments – the original photo and various thumbnails. No more scattered files all across the filesystem, everything is kept together!

Plus you could serve them directly from CouchDB if needed.

A diagram of their architecture:

Poyomi CouchDB

Added to Powered by NoSQL.

Original title and link: CouchDB Case Study: Poyomi’s architecture where.NET meets NoSQL and AMQP (NoSQL databases © myNoSQL)


CouchDB Case Study: Building a Track and Trace Application

Migrated from PostgreSQL and DoDoStorage (in-house document store) to CouchDB:

Currently we are running CouchDB with subset of our tracking archives. This subset is about a quarter million documents of wich 20% or so have attachments resulting in a database size of about 5 GB. No complains so far.

This CouchDB case study provides details about both the data model and data access.

Original title and link: CouchDB Case Study: Building a Track and Trace Application (NoSQL databases © myNoSQL)


CouchDB Case Study: Thingler, Collaborative Todo Lists

The data-model of Thingler was really simple: there was only one type of object: the Room (essentially a passworded list mapped to a url). The document-store aspect of CouchDB worked well, the application could store all the information in the room, such as the items in the todo (and associated tags), the password, the name and url of the room — in a single document, that was one GET away. The nice thing about this was that the document _id was also the URL of the room. To generate the URLs, the application just used Couch’s _uuids API.

Question to the experts: why using node.js and not CouchApp?

Original title and link: CouchDB Case Study: Thingler, Collaborative Todo Lists (NoSQL databases © myNoSQL)


CouchDB Users: The Large Hadron Collider

Couchio announced another user[1] of CouchDB: the European Organization for Nuclear Research (CERN) — responsible for the Large Hadron Collider — in the Compact Muon Solenoid experiment:

The DMWM team has issues that don’t fit well into standard relational databases or files in a filesystem. Being able to easily access and consolidate data from distributed locations with minimal latency is required routinely. Typically, external access to a site is limited, so incoming connections to a database aren’t possible. The team often doesn’t have clear requirements to address, which means metadata is either not collected or not effectively used. Generally, the team members must prototype tools quickly and be able to demonstrate that they are ready to go into production.

In case you missed it, back in June, the same project ☞ announced the usage of MongoDB. So, wouldn’t it be interesting to hear why MongoDB was replaced by CouchDB? (in case that’s true and the two are not in fact used in parallel).

Update: according to @LusciousPear the two projects are used side-by-side.

  1. Couchio people call these case studies, but I’d say they are only success stories or “who’s using” lists. A case study usually answers the questions: what, why, how, with a bonus part on lessons learned. Like Netvibes using Tokyo Tyrant or Adobe using HBase or Twitter looking into using Cassandra.  ()

Original title and link for this post: CouchDB Users: The Large Hadron Collider (published on the NoSQL blog: myNoSQL)


CouchDB Case Study: CouchDB at BBC presented by Enda Farrell

Presented at QCon London 2010:

Enda Farrell discusses how CouchDB is used by BBC for some of its websites, presenting the context it is deployed in, the operations performed against it, how replication and compacting works, some statistics, and how it is used at scale.

BBC is probably the most often mentioned CouchDB case study. You’ll learn a couple of very interesting tricks of running CouchDB at large scale.

Original title and link for this post: CouchDB Case Study: CouchDB at BBC presented by Enda Farrell (published on the NoSQL blog: myNoSQL)

CouchDB Case Study: CouchDB for Reporting and More

Considering the current limitations of the CouchDB mapreduce/views — i.e. no dynamic queries, views being updated on read access, awkward pagination — I am a bit confused:

Initially, Aptela’s developers were considering using CouchDB solely for reporting, with the schema-less design proving particularly useful when every day seemed to bring a new reporting requirement.

My thoughts:

  • schema-less makes for a good solution for storing free format data
  • free format data is difficult to query
  • there are no dynamic queries in CouchDB and views are updated on read and that would make reporting difficult
  • Hadoop (with Pig) or even SQL seems to fit better the scenario of new reports every day (which solution should be used pretty much depends on the size of the data)

What am I missing?

CouchDB Case Study: CouchDB for Reporting and More originally posted on the NoSQL blog: myNoSQL


CouchDB: A Perfect Fit for Twitter Apps

In the past I’ve covered extensively NoSQL-based Twitter apps — since that post I’ve added some more: here, here and here — but interestingly enough not many of these where using CouchDB, the NoSQL database for the web that recently released its 1.0 version.

Things can definitely change after reading the article written by Mark Headd explaining what makes CouchDB a perfect fit for Twitter applications:

  • You interact with a CouchDB instance the same way that you interact with the Twitter API — by making HTTP calls. This can help keep the code for your application clean and simple, and provides lots of opportunities for code reuse within your application.
  • The structure of documents in CouchDB are JSON, which is one of the formats returned from the Twitter API when searching for Tweets (or “status objects” in Twitter parlance).
  • Documents in a CouchDB database are assigned a globally unique ID — it’s how documents are distinguished from one another. Twitter also uses unique identifiers for status objects, so using the ID of a Twitter status object as the ID for a document in CouchDB makes life pretty easy for a Twitter app developer.

While not directly related to this, there’s also another connection between CouchDB and Twitter: Gizzard framework can be used for scaling CouchDB.


CouchDB Case Study: BBC

Part of the CouchDB case study series:

The BBC architects chose CouchDB to create a multi-master multi-datacenter failover configuration. This allows them to use 32 nodes split between two datacenters. Of the 16 nodes in each datacenter, 8 are primary nodes and the other 8 are backup nodes, but the nodes themselves are not aware of the fact that they are designated as a primary or backup node. This works well for the BBC because they can commission more nodes as their need for capacity rises.

I’ve seen Enda Farrell’s talk at QCon London and if my memory serves me right, there are a couple of additional details that are probably interesting for this CouchDB case study:

  • BBC build a key-value API for accessing CouchDB stored data. Main purpose of this API is to make sure there’s no access to views
  • BBC is using an internally developed replication mechanism (and not the default CouchDB replication). I stand corrected: according to J.Chris (Couchio) BBC is using CouchDB replication with a custom layer to manage it.