Python: All content tagged as Python in NoSQL databases and polyglot persistence
Friday, 22 February 2013
Creating a Simple Bloom Filter in Python
Max Burstein:
Bloom filters are super efficient data structures that allow us to tell if an object is most likely in a data set or not by checking a few bits. Bloom filters return some false positives but no false negatives. Luckily we can control the amount of false positives we receive with a trade off of time and memory.
Explanations and code included.
Original title and link: Creating a Simple Bloom Filter in Python (©myNoSQL)
via: http://maxburstein.com/blog/creating-a-simple-bloom-filter/
Monday, 11 February 2013
Flatten Entire HBase Column Families With Pig and Python UDFs
Chase Seibert:
Most Pig tutorials you will find assume that you are working with data where you know all the column names ahead of time, and that the column names themselves are just labels, versus being composites of labels and data. For example, when working with HBase, it’s actually not uncommon for both of those assumptions to be false. Being a columnar database, it’s very common to be working to rows that have thousands of columns. Under that circumstance, it’s also common for the column names themselves to encode to dimensions, such as date and counter type.
Original title and link: Flatten Entire HBase Column Families With Pig and Python UDFs (©myNoSQL)
via: http://chase-seibert.github.com/blog/2013/02/10/pig-hbase-flatten-column-family.html
Tuesday, 5 February 2013
A Guide to Python Frameworks for Hadoop
Uri Laserson’s dives into the world of Python frameworks for Hadoop:
So my first order of business was to investigate some of the options that exist for working with Hadoop from Python.
In this post, I will provide an unscientific, ad hoc review of my experiences with some of the Python frameworks that exist for working with Hadoop, including:
- Hadoop Streaming
- mrjob
- dumbo
- hadoopy
- pydoop
For easy access, links to these frameworks:
Original title and link: A Guide to Python Frameworks for Hadoop (©myNoSQL)
via: http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/
Wednesday, 30 January 2013
Social Network Analysis of Apache CloudStack
Nice data experiment run by Sebastien Goasguen against the CloudStack mailing list:
To get the graphs I grabbed the emails archive from Apache. I used Python to load the mbox files into single Mongo collections. I cleaned the data to avoid replications of senders as well as remove JIRA and Review Board entries. Then with a little bit of PyMongo I made the queries and build the graph with NetworkX. Finished up with the graph visualization and calculations using Gephi. Since there are thousands of emails and threads, there is still some work to pre-process the data, avoid duplicates and match individuals to multiple email addresses.
Three questions:
- would using a graph database made this experiment easier?
- would Linkurious be able to generate these graphics?
- is the code available anywhere so someone else could try to use a graph database and maybe run other types of visualizations?
Original title and link: Social Network Analysis of Apache CloudStack (©myNoSQL)
via: http://sebgoa.blogspot.ch/2013/01/social-network-analysis-of-apache.html
Friday, 11 January 2013
MongoMem: Memory Usage by Collection in MongoDB
MongoMem, a Python tool, by Wish Tech team:
Today, we’re releasing the first of these tools, MongoMem. MongoMem solves the age-old problem of figuring out how much memory each collection is using. In MongoDB, keeping your working set in memory is pretty important for most apps. The problem is, there’s not really a way to get visibility into the working set or what’s in memory beyond looking at resident set size or page faults rate.
Original title and link: MongoMem: Memory Usage by Collection in MongoDB (©myNoSQL)
via: http://eng.wish.com/mongomem-memory-usage-by-collection-in-mongodb/
Tuesday, 30 October 2012
Recommending Friends With MapReduce and Python
Marcel Caraciolo describes the MapReduce-based friends recommendation engine:
That’s a simple algorithm used at Atépassar for recommending friends using some basic graph analysis concepts.
Considering the network only has 140k users, the first question that came to my mind was why MapReduce and not a graph database?
Original title and link: Recommending Friends With MapReduce and Python (©myNoSQL)
via: http://aimotion.blogspot.ca/2012/10/atepassar-recommendations-recommending.html
Sunday, 5 August 2012
Demoing the Python-Based Map-Reduce R3 Against GitHub Data
A nice demo of the recently announced MapReduce engine written on Python r3 library1 against commit histories from GitHub:
It is pretty simple to get r3 to do some cool calculations for us. I got the whole sample in a very short amount of time. It took me more time to write this post than to make r3 calculate the commiter percentages.
-
r3 is a Python-based map-reduce engine using Redis as a backend ↩
Original title and link: Demoing the Python-Based Map-Reduce R3 Against GitHub Data (©myNoSQL)
via: http://blog.heynemann.com.br/2012/08/04/r3-a-quick-demo-of-usage/
Thursday, 12 July 2012
How I Asynchronized MongoDB Python Synchronous Library
A.Jesse Jiryu Davis:
PyMongo is three and a half years old. The core module is 3000 source lines of code. There are hundreds improvements and bugfixes, and 7000 lines of unittests. Anyone who tries to make a non-blocking version of it has a lot of work cut out, and will inevitably fall behind development of the official PyMongo. With Motor’s technique, I can wrap and reuse PyMongo whole, and when we fix a bug or add a feature to PyMongo, Motor will come along for the ride, for free.
Original title and link: How I Asynchronized MongoDB Python Synchronous Library (©myNoSQL)
via: http://emptysquare.net/blog/motor-internals-how-i-asynchronized-a-synchronous-library/
Tuesday, 8 May 2012
Peg Solitaire With Python and MongoDB
David Taylor:
Anyway to cut a long story short my attempt eventually failed because my mathematical naivety hid the fact that a brute force attack would result in far too many hours of computation and a database that was simply too vast. I gave up after running it for three hours and it was showing that it had computed 24 million board states, it still had 18 million un-computed child boards to investigate and had a 23Gig database. I think it is still possible to do this almost completely with brute force if I remove symmetrical board states (apparently if done right there are only 23 million possible board states when symmetry is considered) but that is way beyond just investigating the technology and object orientation.
Peg Solitaire sounds like a good excuse to look into Python and MongoDB.
Original title and link: Peg Solitaire With Python and MongoDB (©myNoSQL)
via: http://davidandrewtaylor.blogspot.co.uk/2012/05/python-and-nosql-after-listening-to.html
Friday, 6 April 2012
DynamoDB Libraries, Mappers, and Mock Implementations
A list of DynamoDB libraries covering quite a few popular languages and frameworks:

A couple of things I’ve noticed (and that could be helpful to other NoSQL database companies):
- Amazon provides official libraries for a couple of major programming languages (Java, .NET, PHP, Ruby)
- Amazon is not shy to promote libraries that are not official, but established themselves as good libraries (e.g. Python’s Boto)
- The list doesn’t seem to include anything for C and Objective C (Objective C is the language of iOS and Mac apps)
Original title and link: DynamoDB Libraries, Mappers, and Mock Implementations (©myNoSQL)
Monday, 2 April 2012
Another Redis-Based Queue for Python: Introducing RQ
Vincent Driessen creates RQ as an alternative to Celery inspired by Resque:
I wanted a solution that was lightweight, easy to adopt, and easy to grasp. So I devised a simple queueing library for Python, and dubbed it RQ.
Welcome to the world of a thousand Redis-based queues.
Original title and link: Another Redis-Based Queue for Python: Introducing RQ (©myNoSQL)
Wednesday, 21 March 2012
In-Memory Key-Value Store in C, Go and Python
Graham King:
On paternity leave for my second child, I found myself writing an in-memory hashmap (a poor-man’s memcached), in Go, Python and C. I was wondering how hard it would be to replace memcached, if we wanted to do something unusual with our key-value store. I also wanted to compare the languages, and, well, I get bored easily!
Actually it’s very easy and doesn’t require any coding at all. Plus you’ll get a bit more than what you’d expect.
Original title and link: In-Memory Key-Value Store in C, Go and Python (©myNoSQL)
via: http://www.darkcoding.net/software/in-memory-key-value-store-in-c-go-and-python/
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling
