NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Social Network Analysis of Apache CloudStack

Nice data experiment run by Sebastien Goasguen against the CloudStack mailing list:

To get the graphs I grabbed the emails archive from Apache. I used Python to load the mbox files into single Mongo collections. I cleaned the data to avoid replications of senders as well as remove JIRA and Review Board entries. Then with a little bit of PyMongo I made the queries and build the graph with NetworkX. Finished up with the graph visualization and calculations using Gephi. Since there are thousands of emails and threads, there is still some work to pre-process the data, avoid duplicates and match individuals to multiple email addresses.


Three questions:

  1. would using a graph database made this experiment easier?
  2. would Linkurious be able to generate these graphics?
  3. is the code available anywhere so someone else could try to use a graph database and maybe run other types of visualizations?

Original title and link: Social Network Analysis of Apache CloudStack (NoSQL database©myNoSQL)