NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Python: All content tagged as Python in NoSQL databases and polyglot persistence

Hadoop and Elastic MapReduce at Yelp

A story of using Hadoop at Yelp and migrating it to Amazon Elastic MapReduce:

We used to do what a lot of companies do, which is run a Hadoop cluster. We had a dozen or so machines that we otherwise would have gotten rid of, and whenever we pushed our code to our webservers, we’d push it to the Hadoop machines.

It was also not so cool. You couldn’t really tell if a job was going to work at all until you pushed it to production. But the worst part was, most of the time our cluster would sit idle, and then every once in a while, a really beefy job would come along and tie up all of our nodes, and all the other jobs would have to wait.

Yelp has released their Python library for running MapReduce jobs on Hadoop or Amazon Elastic MapReduce on ☞ GitHub.

Original title and link: Hadoop and Elastic MapReduce at Yelp (NoSQL databases © myNoSQL)


MongoDB: Designing Trees using mongodm

The first reason that bring me away from the great mongoengine API is that’s there’s no way to easily manage recursive trees.

Sounds like someone agrees with me.

Original title and link: MongoDB: Designing Trees using mongodm (NoSQL databases © myNoSQL)


MongoDB: Best Python Mapper

I have found on ☞ Quora a detailed comparison of two of the most popular MongoDB Python ODMs: ☞ MongoKit and ☞ MongoEngine (nb the post is using the term ORM, but I guess that’s just out of habit):

I prefer the manner of declaring field types in MongoEngine to MongoKit, but that’s just me. If you’re coming from something like the Django ORM, MongoEngine is very similar.

If you need to make on-the-fly modifications to document schemas at runtime, MongoKit is the way to go. MongoKit also allows you to bypass validation.

Finally, compare the MongoKit and MongoEngine documentation. I find the MongoEngine documentation a much more useful reference (it’s easier to navigate and read — very much my opinion though):

While I’m not a very experienced Pythonista, nor have I used any of these libraries, I must confess that I’m finding both being too much inspired from ORMs. There is structure in document databases, but enforcing all the rules and strictness of a relational model seems a bit too restrictive. Plus, it is unclear on how you can actually take advantage of document databases schemaless when using these libraries.

Original title and link: MongoDB: Best Python Mapper (NoSQL databases © myNoSQL)

Redis: Implementing Auto Complete or How to build Trie on Redis

In the days the news are about instant searches and auto complete, Salvatore Sanfilippo (@antirez) shows how to use Redis sorted sets and corresponding commands (ZRANGE, ZRANK) to implement autocompletion:

The initial code in Ruby:

already got ported to Python:

and ☞ Java and ☞ PHP.

As Ilya Grigorik (@igrigorik) commented, this is building a ☞ Trie with Redis.

Original title and link: Redis: Implementing Auto Complete or How to build Trie on Redis (NoSQL databases © myNoSQL)


Tornado Sees Some NoSQL Activity

Tornado, the non-blocking web server and tools open sourced by FriendFeed before their acquisition, seems to get some NoSQL activity. While Django is leading the way in the Python world, judging by the NoSQL projects happening around Node.js, one could say that Tornado, with its non-blocking architecture, may be an interesting alternative.

Thomas Pelletier has ☞ a blog post about a simple websocket + Tornado + Redis Pub/Sub protocol integration:

The principle is very simple: when your user loads the page, she is automatically added to a list of “listeners”. An independent thread is running: it listens for messages from Redis with the subscribe command, and send a message through Websocket to every registered ”listener”. In this example, the user can send a message to herself with a simple AJAX-powered form, which calls a view with a payload (the message), and the view publish it via the publish command of Redis.

This is basically a web chat! If you want to have fun, you can then add a roster, with a presence system, authentication etc…

There’s also a ☞ GitHub project called Trombi:

Trombi is an asynchronous CouchDB client for Tornado.

And I’m pretty sure there are other projects I’ve missed (but you can leave a comment to add them to the list).

Original title and link for this post: Tornado Sees Some NoSQL Activity (published on the NoSQL blog: myNoSQL)

Presentation: RestMQ - HTTP/Redis based Message Queue

Gleicon Moraes’ slide deck about RestMQ, an HTTP/Redis based message queue. More about RestMQ can be found ☞ here and the source code is available on ☞ GitHub.

Keep in mind that Redis-backed queues is one very often cited use case for Redis.

Original title and link for this post: Presentation: RestMQ - HTTP/Redis based Message Queue (published on the NoSQL blog: myNoSQL)

MapReduce with MongoDB and Python

Complete example of using MongoDB MapReduce with PyMongo:

In this post, I’ll present a demonstration of a map-reduce example with MongoDB and server side JavaScript. Based on the fact that I’ve been working with this technology recently, I thought it would be useful to present here a simple example of how it works and how to integrate with Python.

Original title and link for this post: MapReduce with MongoDB and Python (published on the NoSQL blog: myNoSQL)


MongoDB with Python

Firstly, Diarmuid Bourke’s presentation at PyCon Irland 2010:

Mike Dirolf ☞ covers the details of working with PyMongo and replica sets, one of the most interesting features in the MongoDB 1.6 releases:

PyMongo makes working with replica sets easy. Here we’ll launch a new replica set and show how to handle both initialization and normal connections with PyMongo.

And in case you’d like to learn some more you can also check:

Original title and link for this post: MongoDB with Python (published on the NoSQL blog: myNoSQL)

Django and NoSQL Databases Revisited

Django decided long time ago that Ruby on Rails cannot be the only framework where people can have fun integrating with all NoSQL databases. During this year DjangoCon Europe there were several session dedicated to Django and NoSQL databases:

What NoSQL support in the Django ORM looks like, and how do we get there

Alex Gaynor speaks about what needs to change in Django ORM to make it more NoSQL friendly:

Reinout van Rees has a summary of the talk ☞ here.

Using MongoDB in your app

Peter Bengtsson talks about his experience of passing from using ZODB for the last 10 years to MongoDB

Some notes from the talk are available ☞ here.

Relax your project with CouchDB

Benoît Chesneau talks about what makes CouchDB appealing to python developers. He also covers the CouchDBkit python framework.

Django and Neo4j: Domain Modeling that Kicks Ass

Not coming from DjangoCon, but still about Django and Neo4j, is Tobias Ivarsson’s presentation: “Django and Neo4j - Domain modeling that kicks ass”:

Derek Stainer summarizes the slide deck ☞ here.

Django and NoSQL Panel

A fantastic panel on the future of Django and NoSQL databases that you can watch over ☞ Reinout van Rees published a transcript of the panel ☞ here.

All in all a lot of NoSQL excitement in the Django world! Or should it be the opposite?

Update: Here is the latest Django and NoSQL Databases status update

Django and NoSQL Databases Revisited originally posted on the NoSQL blog: myNoSQL

Gephi: Visualization Library for Graph Databases

You probably know by now that I love visualization tools:

Get the version of Gephi app that can read neo4j databases bzr branch

Gephi and Neo4j


Quick Dive into Hypertable Thrift API

I like the parallels with notions from the MySQL world:

[…] let’s take a look at high performance reading using Scanner. To those who are familiar with MySQL, the concept of using scanner is quite similar to the SSCursor. Instead of reading all the records into client side memory, there is a server-side cursor that’s “streaming” the result set to client side.


Miniredis: Python-based Redis Clone

Benjamin Pollack:

A very tiny clone of Redis, mostly for Windows support