NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



NoSQL libraries: All content tagged as NoSQL libraries in NoSQL databases and polyglot persistence

ZkFarmer: Tools for Managing Distributed Server Farms Using Apache ZooKeeper


With ZkFarmer, each server registers itself in one or several farms. Thru this registration, hosts can expose arbitrary information about their status.

On the other end, ZkFarmer helps consumers of the farm to maintain a configuration file in sync with the list of hosts registered in the farm with their respective configuration.

In the middle, ZkFarmer helps monitoring and administrative services to easily read and change the configuration of each host.

Currently ZkFarmer provides the following functionality:

  1. Registering ZooKeeper services to one or several farms (joining a farm)
  2. Listing farms and hosts
  3. Read and Write farms content
  4. Farm monitoring
  5. Syncing farm configuration

Original title and link: ZkFarmer: Tools for Managing Distributed Server Farms Using Apache ZooKeeper (NoSQL database©myNoSQL)

Automatic Async and Sync Pipelining of Redis Commands

The nuts and bolts of implementing synchronous and asynchronous Redis clients supporting pipelining:

In this post I describe different approaches for client-libraries to implement Redis protocol pipelining. I will cover synchronous as well as asynchronous (event-driven) techniques and discuss their respective pros and cons: Synchronous client APIs require the library user to explicitly pipeline commands, potentially yielding optimal protocol performance, but at the cost of additional bookkeeping when handling replies. Asynchronous client libraries, on the other hand, allow automatic pipelining, while being less efficient in their pipelining behavior.

Original title and link: Automatic Async and Sync Pipelining of Redis Commands (NoSQL database©myNoSQL)


Asyncdynamo: Amazon DynamoDB Async Python Library by Bitly

Bitly’s new asynchronous Amazon DynamoDB Python client:

Asyncdynamo requires Boto and Tornado to be installed, and must be run with Python 2.7. It replaces Boto’s synchronous calls to Dynamo and to Amazon STS (to retrieve session tokens) with non-blocking Tornado calls. For the end user its interface seeks to mimic that of Boto Layer1, with each method now requiring an additional callback parameter.

Available on GitHub.

Original title and link: Asyncdynamo: Amazon DynamoDB Async Python Library by Bitly (NoSQL database©myNoSQL)


Neo4j and JRuby: Expressive Graph Traversals With Jogger

Jogger gives you named traversals and is a little bit like named scopes. Jogger groups multiple pacer traversals together and give them a name. Pacer traversals are are like pipes. What are pipes? Pipes are great!!

The most important conceptual difference is, that the order in which named traversals are called matter, while it usually doesn’t matter in which order you call named scopes.

Knowing how Gremlin and Cypher compare, question is how is Jogger compared to Cypher?

Original title and link: Neo4j and JRuby: Expressive Graph Traversals With Jogger (NoSQL database©myNoSQL)

Automating Cassandra Operations and Management With Netflix's Priam Tool

A new open source tool from Netflix, Priam—back in November, Netflix has released Curator, a ZooKeeper library—, used to simplify and automate the operations and management of a Cassandra cluster:

Priam is a co-process that runs alongside Cassandra on every node to provide the following functionality:

  • Backup and recovery
    • snapshot and incremental backups
    • compression and multipart off-site uploading
    • data recovery and data testing
  • Bootstrapping and automated token assignment

    Priam automates the assignment of tokens to Cassandra nodes as they are added, removed or replaced in the ring. Priam relies on centralized external storage (SimpleDB/Cassandra) for storing token and membership information, which is used to bootstrap nodes into the cluster. It allows us to automate replacing nodes without any manual intervention, since we assume failure of nodes, and create failures using Chaos Monkey. The external Priam storage also provides us valuable information for the backup and recovery process.

  • Centralized configuration management: All our clusters are centrally configured via properties stored in SimpleDB, which includes setup of critical JVM settings and Cassandra YAML properties.

  • RESTful monitoring and metrics: provides hooks that support external monitoring and automation scripts. They provide the ability to backup, restore a set of nodes manually and provide insights into Cassandra’s ring information. They also expose key Cassandra JMX commands such as repair and refresh.

Original title and link: Automating Cassandra Operations and Management With Netflix’s Priam Tool (NoSQL database©myNoSQL)


Quick Guide to MongoDB and Python With PyMongo

A tutorial on PyMongo from Rick Copeland covering:

  • configuration options for MongoDB
  • documents structure, inserts and batch inserts
  • querying and indexing
  • deleting
  • updating

One thing that’s nice about the pymongo connection is that it’s automatically pooled. What this means is that pymongo maintains a pool of connections to the mongodb server that it reuses over the lifetime of your application. This is good for performance since it means pymongo doesn’t need to go through the overhead of establishing a connection each time it does an operation. Mostly, this happens automatically. you do, however, need to be aware of the connection pooling, however, since you need to manually notify pymongo that you’re “done” with a connection in the pool so it can be reused.

Original title and link: Quick Guide to MongoDB and Python With PyMongo (NoSQL database©myNoSQL)


An Introduction to Scalding, the Scala and Cascading MapReduce Framework From Twitter

A fantastic guide to Twitter’s Scala and Cascading MapReduce framework Scalding from Edwin Chen1:

In 140: instead of forcing you to write raw map and reduce functions, Scalding allows you to write natural code like

// Create a histogram of tweet lengths.'tweet -> 'length) { tweet : String => tweet.size }.groupBy('length) { _.size }

Looking at the code samples, this looks a lot like Apache Pig. But the Scalding documentation compares it to Scrunch/Scoobi and points to the answers in this Quora thread:

The main difference between Scalding (and Cascading) and Scrunch/Scoobi is that Cascading has a record model where each element in your distributed list/table is a table with some named fields. This is nice because most common cases are to have a few primitive columns (ints, strings, etc…).

  1. Edwin Chen is data scientist at Twitter 

Original title and link: An Introduction to Scalding, the Scala and Cascading MapReduce Framework From Twitter (NoSQL database©myNoSQL)


Storing Django Sessions in DynamoDB with django-dynamodb-sessions



  • reduces read/write access to your main database
  • all DynamoDB benefits:
    • fully manged solution
    • scalable
    • fast and predictable performance

Cons (or more of when not to use it):

  • if your application is not running in the AWS cloud
  • the size of the sessions is bigger than 64KB

Original title and link: Storing Django Sessions in DynamoDB with django-dynamodb-sessions (NoSQL database©myNoSQL)


Connection Management in MongoDB and CongoMongo

Are connections pooled or not? Konrad Garus digs to find the answer:

Easy. Too easy and comfortable. Coming from the old good and heavy JDBC/SQL I felt uneasy with the connection management. How does it work? Does it just open a connection and leave it dangling in the air the whole time? Might be good for a quick spike in REPL, but not for a real application which needs concurrency, is supposed to be running for days and weeks, and so on. How do you maintain it properly?

Original title and link: Connection Management in MongoDB and CongoMongo (NoSQL database©myNoSQL)


Getting Started With Ruby and Neo4j Using Neography

Getting started with Ruby and Neo4j is very easy. Follow these steps and you’ll be up and running in no time.First we install the neography […]

The traversal API looks really nice and comes in two flavors: the Neo4j REST API and a Ruby-esque one.

Original title and link: Getting Started With Ruby and Neo4j Using Neography (NoSQL database©myNoSQL)