NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



usecase: All content tagged as usecase in NoSQL databases and polyglot persistence

Exploring Neo4j, the NoSQL Graph Database

Rahul Sharma takes a look at Neo4j and some basic operations with graph databases:

Let us say we want to implement a use-case where there are persons and a person can be connected to other persons. In order to use Neo4J we must think about POJOs in terms of interfaces and corresponding implementions. This is so because the database is a key-value store at the back, so it asks us to store the properties of the POJO in terms of key-value pairs. Moreover there are no foreign keys in Neo4J, objects in the db are connected with other objects using Relationships.

Interestingly, he mentions getting some errors when trying to push 151K names. Sounds like he could use this Neo4j tip for handling long transactions.

Original title and link for this post: Exploring Neo4j, the NoSQL Graph Database (published on the NoSQL blog: myNoSQL)


InfiniteGraph Use Case: Modeling Stackoverflow

I didn’t hear much about InfiniteGraph after its 1.0 release, except this post that uses Stackoverflow data as input to demo some features of graph databases:

The vertices in the graph are represented as the Users, Questions and Answers above while the edges are represented as the interactions between them (i.e. a User “Posts” a Question, an Answer is “For” a Question, a User “Comments On” a Question or Answer). Simple enough, and like most other social graphs, users seem to be the focal points with the majority of connected edges. Now all I needed was a sample application that could construct the graph data model from the XML sources and run some queries.

Original title and link for this post: InfiniteGraph Use Case: Modeling Stackoverflow (published on the NoSQL blog: myNoSQL)


CouchDB, Mobile Devices and The Distributed Web Data

Getting back from two crazy days I’m finding that the big news (at least in the media) is that CouchDB has released an CouchDB SDK for Android. You can read more about how to get it ☞ here.

We already knew that thanks to its friendly protocol and advanced replication features, CouchDB is a solid option when looking for distributed web data, Palm webOS and its db8 usage of CouchDB for replication being a very good example of this CouchDB use case.

CouchDB, Mobile Devices and The Distributed Web Data originally posted on the NoSQL blog: myNoSQL

Graylog2: MongoDB-backed Syslog System

Manage your logs in the dark and have lasers going and make it look like you’re from space.

An open source syslog system based on a Java server for accepting messages and Ruby on Rails for visualization with MongoDB used as storage. You can get it ☞ here.

Presentation: Redis - Persistence Power or Redis Use Cases

Nick Quaranto slides are a great summary of a few Redis use cases:

Heroku Encourages Polyglot Persistence

Heroku published an article preaching polyglot persistence through a Database-as-a-Service approach:

Database-as-as-service is one of the coming decade’s most promising business models. […] DaaS also goes hand-in-glove with polyglot persistence. Thanks to database services, you won’t need to learn how to sysadmin/DBA for every datastore you use – you can instead outsource that job to a service provider specializing in each database.

While it definitely sounds exciting to be able to use all these NoSQL databases , we should always keep in mind the cost of complexity even if DaaS will help alleviate some of the complexity of heterogeneous systems.

The article includes also some interesting use cases for a couple of NoSQL databases:

  • Frequently-written, rarely read statistical data (for example, a web hit counter) should use an in-memory key/value store like Redis, or an update-in-place document store like MongoDB.
  • Big Data (like weather stats or business analytics) will work best in a freeform, distributed db system like Hadoop.
  • Binary assets (such as MP3s and PDFs) find a good home in a datastore that can serve directly to the user’s browser, like Amazon S3.
  • Transient data (like web sessions, locks, or short-term stats) should be kept in a transient datastore like Memcache. (Traditionally we haven’t grouped memcached into the database family, but NoSQL has broadened our thinking on this subject.)
  • If you need to be able to replicate your data set to multiple locations (such as syncing a music database between a web app and a mobile device), you’ll want the replication features of CouchDB.
  • High availability apps, where minimizing downtime is critical, will find great utility in the automatically clustered, redundant setup of datastores like Casandra and Riak.

These are good examples, but you can find many more in our coverage of NoSQL uses cases and the per-product case studies: CouchDB case studies or MongoDB case studies, etc.

Heroku Encourages Polyglot Persistence originally posted on the NoSQL blog: myNoSQL


Redis Usecase: API Access Logger

Nice combination of Redis and MySQL:

Redis has to keep all stored objects in memory, so just putting all data in there and forgetting about it was out of the question. We decided to only keep a few days of data in Redis and archive the results to MySQL. Daily API usage stats would be served directly by Redis, archived results on date ranges would be fetched from MySQL.

Note also what correct Redis data modeling means: usage of Redis data structures combined with smart keys (nb smart in the sense of keys carrying additional meta-information).


Building a MongoDB-based Queue

Matt Insler shows how to build a MongoDB-based queue using the server-side javascript and findAndModify command, using it to replace usage of Amazon SQS in his application:

I have been using MongoDB for a while now and am enamored with what it can do. I know that it can store lots of schema-less data in 4MB chunks (a document is limited to 4MB) and can store larger files through the use of GridFS. I know that it’s lightning fast (almost memcached speed) for indexed lookups and can handle thousands of operations per second without spiking the CPU over 10% even. I know that I’m paying for the CPU and hard drive space on Amazon EC2 already and thoroughly enjoy minimizing my monthly, weekly, and even daily costs. Blah. Blah. Blah. I want to implement this in Mongo!

But make no mistake: this approach just replaced a reliable, highly scalable, hosted (i.e. involving no operational costs) with a solution that misses all these.


Updates on Cassandra Usage at Twitter

Just two days after my Cassandra status update, the Twitter engineering blog is publishing an article sharing more details about Cassandra usage at Twitter.

So, how is Twitter using Cassandra today?

  • Cassandra as database of places of interest used by the geo team[1]
  • Cassandra as storage for the data mining research team
  • Cassandra as an upcoming storage solution for real time analytics

In case you wonder what changed, Twitter will not migrate the tweets storage to Cassandra and continue to save and serve these from the existing MySQL cluster:

We believe that this isn’t the time to make large scale migration to a new technology. We will focus our Cassandra work on new projects that we wouldn’t be able to ship without a large-scale data store.

  1. Probably this is similar to how SimpleGeo is using Cassandra  ()


Redis-based Configuration Management at GitHub

Instead of config files and if-s, use Redis to store your flags:


Tekpub: Using both MongoDB and MySQL

You shouldn’t be afraid to use both NoSQL and RDBMS in your projects if they help you address real problems:

We split out the duties of our persistence story cleanly into two camps: reports of things we needed to know to make decisions for the business and data users needed to use our site. Ironically these two different ways of storing data have guided us to do what’s natural: put the application data into a high-read, high-availability environment (MongoDb) - put the historical, reporting data into a system that is built to answer questions: a relational data store.

The high-read stuff (account info, productions and episode info) is perfect for a “right now” kind of thing like MongoDb. The “what happened yesterday” stuff is perfect for a relational system.

We don’t want to run reports on our live server. You don’t know how long they will take - nor what indexing they will work over (limiting the site’s perf). Bad. Bad-bad.

Much better case study than this one!

This post is part of the MongoDB Case Studies series.


CouchDB Case Study: Web Based IRC

Another CouchDB case study this time from Anologue:

Initial goal: enable any number of people to view a web page that would serve as a sort of chat-room. Generate a link, share it with whomever you’d like to participate in the dialogue, type your name and text to add to the conversation.

I’d speculate that CouchDB was used due to its possibly simplified architecture of the web app and its document-based data model. Definitely not based on some “fake” or just plain wrong reasons.

Adding it to the list of CouchDB case studies.