NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



python: All content tagged as python in NoSQL databases and polyglot persistence

Friendlier CLI frontend for HBase

Going around HBase’s console program and also Thrift interface by using Jython as a CLI frontend:

HBase, the well known non-relational distributed database, comes with a console program to perform various operations on a HBase cluster. I’ve personally found this tool to be a bit limited and I’ve toyed around the idea of writing my own. Since HBase only comes with a Java driver for direct access and the various RPC interfaces such as Thrift don’t offer the full set of functions over HBase, I decided to go for Jython and to directly use the Java API. This article will show a mock-up of such a tool.


Getting started with Redis, Python and YQL

A quick intro to Redis by Khashayar showing why he loves Redis, how to install and perform basic operations against Redis and building an RSS-to-Twitter tool with Python, YQL and Redis:

In this code we first use YQL to get the RSS. Then we parse the RSS to get our desired field […]. After that we save these values to our database […]


NoSQL News & Links 2010-04-16

  1. Tarek Ziadé: ☞ A Firefox plugin experiment. XUL, Bottle and Redis
  2. Andreas Jung: ☞ Looking beyond one’s own nose - looking at RabbitMQ and MongoDB

    Unsorted remarks on RabbitMQ and MongoDB plus some benchmarks with mass data

  3. Franck Cuny: ☞ presque, a Redis / Tatsumaki based message queue. Perl and Redis baby!
  4. Mark Atwood: ☞ Reacting to “Memcached is not a store”. IMO, it is as much as a store as any dict/hash you’ve been using. Well, a bit more.
  5. okram: ☞ pipes. A lot of activity around graph databases lately:

    Pipes is a graph-based data flow framework written in Java 1.6+. A process graph (also known as a Kahn process network) is composed of a set of process vertices connected to one another by a set of communication edges. Each process can run independent of the others and as such, concurrency is a natural consequence as data is transformed in a pipelined fashion from input to output.

Redis and Twitter filters in Python or Ruby

Mirko Froehlich has a ☞ long post explaining the problem and the rationale behind the chosen architectures. Then, he goes on presenting the various pieces used in building the solution:

Code is available on ☞ GitHub.

Bulkan Evcimen took this sample application and built it on a Python stack:

So now you have yet another “good” reason[1] to play with Redis and Twitter.


Redis and Python

Just a few days after posting Redis and Ruby, we are not featuring a two article series on Redis and Python.

The ☞ first part will walk you through getting Redis installed, obtaining a Python library for Redis and working with the Redis data types: string, integer, lists, sets, ordered sets. As a side note, recently Redis has also added support for hashes.

The ☞ second article introduces you to a possible solution to handling relationships inside Redis:

We have to cheat in Redis’s flat name space to make relations in our data. Redis isn’t going to be aware of these relations and unlike RDBMS (like MySQL), Redis does nothing to help us out. No index’s, no nifty SQL syntax with WHERE or JOIN to do the work for us. We have to handle all of the relational logic in application code, which in turn means you (the developers) have to do extra documentation explaining just how everything fits together in redis or you are going to lose your data.

Redis Queues: An Emerging Usecase

We’ve been covering tons of Redis usecases, not to mention this amazing list of ideas. Lately, it looks like there is a new emerging usecase that Redis can be proud of: queues.

Now if that already sounds interesting then I guess you could just take a look at QR, a Python ☞ GitHub hosted project that makes it easy to create queues, stacks and deques on top of Redis. For some help on using it you could check Ted Nyman’s posts on ☞ queues and ☞ deques and stacks. Another option would be to head to Resque, a Ruby ☞ GitHub hosted library for creating and processing jobs using Redis queues.

Anyway, if you don’t have yet an idea on how this can be useful, then I hope these following posts will wet your appetite. David Czarnecki’s ☞ article covers a very simple Redis-based queue scenario: inter-application communication (basically the two apps will get an easy way to pass from one to another any kind of messages). If this is still not enough, then Paull Gross’s ☞ post is introducing you to a web proxy built using node.js and Redis queues for high availability.

Last, but not least, I should emphasize the fact that what sets aside Redis as a good tool for this sort of things is not the fact that Redis is a extremely fast, persistent key-value store, but rather Redis native support for ☞ data structures like lists, sets and ordered sets and a set of specific ☞ commands to deal with these.

Presentation: Persistent graphs in Python with Neo4j

These are the slides and video of Tobias Ivarsson (@thobe) presenting at PyCon on Neo4j with a Python flavor.

I really liked this slide in particular:

Python code starts at slide 23. A couple of my comments:

  • I am not really sure I understand how the Python scripts are accessing the Neo4j storage when using CPython (Neo4j is supposed to run in a JVM)
  • traversals in graph databases are somewhat synonymous to queries
  • having the traversal implemented like classes extending neo4j.Traversal doesn’t really look Pythonic
  • Django and Neo4j can work together

Note taking apps a la NoSQL

Sometimes the best way to learn about a new technology or tool is to find a project that might be interesting to you, start playing with it and why not end up customizing and extending it to fit your needs.

While these days you can find tons of note taking applications for your mobile, desktop or “in the cloud”, I think this usecase is extremely easy to understand and it will allow you to focus on the underlying technologies and not some complicated logic.

Snip = Node.js + Redis

This is a basic application that would allow you to store code snippets and have some syntax coloring when displaying them. Source code is available on ☞ bitbucket. = MongoDB + Python

Another basic application that allows you to store notes and bookmarks. A lot of functionality you’d expect from such an application is missing and that could be a good excuse for you to play with its source code available on ☞ GitHub and add exactly what you’d like.

I am pretty sure I have missed a lot of similar apps, so please do forward yours to be added to the list. I am pretty sure that building an extensive list like we did for NoSQL Twitter apps or NoSQL-based blog engines will be both fun and useful.

Python, Django and MongoDB

Interested in Python, Django and MongoDB? Then I hope you’ll find these posts interesting:

And then there is this fresh screencast from Kevin Fricovsky talking Django and MongoDB integration. You can read about it ☞ here, but as a quick summary, the screencast will introduce you to mongoengine and then using Django-Mumblr, a NoSQL-based blog engine it will dive deeper into the details of Django and MongoDB integration.

Update: Just found a couple of more MongoDB Django tricks that you may find interesting.

The first one is a solution that provides access to MongoDB document _id’s from Django templates. The ☞ solution is based on a custom Django filter and using it as in {{ object|mongo_id }}

I find the solution pretty odd, not to mention that using a filter for accessing such an important document information seems convoluted. I’d much prefer to have the _id accessible directly on an object through either a field or at least a special property. Behavior for an unsaved document might be as simple as returning None or raising an exception.

The second trick fixes a problem with using Django’s FileWrapper while working with MongoDB’s GridFS. I’d probably be tempted to call this a bug, so before getting it fixed you can read the details ☞ here.

More NoSQL-based Twitter apps

If you thought we’re running out of NoSQL Twitter apps, you were definitely wrong because I’ve just got a few more.


A Clojure-based solution to write the Twitter stream to Hadoop (by @ieure)


A simple Twitter clone in Python and using MongoDB by Michael Dirolf (@mdirolf). Michael has been featured on MyNoSQL a couple of times already:

Swordfish Twitter Clone

Swordfish — a key-value store built on top of Tokyo Cabinet and offering a RESTful HTTP interface — comes with a Twitter clone based on Django.

Another Tokyo Cabinet based Twitter app. There don’t seem to be many details about the project though. (via Matthew Ford)

Last, but not least, don’t forget to check the first series of NoSQL Twitter apps.

Presentation: MongoDB for Python or Ruby. Your Choice

I am not sure if it is only my impression, but it looks like MongoDB is getting a lot of presentations and video coverage. And the champion seems to be Mike Dirolf (@mdirolf) from 10gen. Below is the freshest presentation he gave on MongoDB at ZPUGDC monthly meetings (a Python group) just a couple of days ago:

My notes:

  • the part about MongoDB replication and auto-sharding is really interesting

    As a side note (and a question for MongoDB people), I keep hearing for quite a while that auto-sharing is still in alpha, so I am wondering what’s the real reason. Have the current implementation hit a dead end? Or aren’t there enough users asking for it? Or what is it? Mike?

  • Difference between OODB and document database:

    • they use fairly similar concepts
    • in OODB you are saving instances, in document databases you are saving data

    That basically translates to the fact that data and objects are decoupled and this comes with both pros and cons.

  • Concurrency:

    • no concurrency in the stable 1.2
    • much better concurrency in 1.3

I will continue to add my notes as I watch the presentation (the video reports 1h20’+).

Update: Here is another confirmation of the thought I’ve expressed at the beginning of the post. Below you can find the presentation Kyle Banker (@Hwaet) gave the other day to ChigacoRuby.

Usecase: RestMQ - A Redis-based Message Queue

I have found the attached presentation introducing RestMQ, a HTTP/REST/JSON message queue quite interesting for the tools it is using:

  • Redis as the storage of the queue messages
    • Redis LIST, SET and Ordered SET data types make it easy to be used for this usecase
    • my note: not sure the project is also using the some of the blocking operations available on lists to simulate the queues[1]
  • async client with support for connection pooling and client-side sharding[2]
  • Cyclone[3]: a Twisted based Tornado[4] clone

RestMQ source code is available on ☞ GitHub