Nice data experiment run by Sebastien Goasguen against the CloudStack mailing list:
To get the graphs I grabbed the emails archive from Apache. I used
Python to load the mbox files into single Mongo collections. I
cleaned the data to avoid replications of senders as well as remove
JIRA and Review Board entries. Then with a little bit of PyMongo I
made the queries and build the graph with NetworkX. Finished up with
the graph visualization and calculations using Gephi. Since there
are thousands of emails and threads, there is still some work to
pre-process the data, avoid duplicates and match individuals to
multiple email addresses.
- would using a graph database made this experiment easier?
- would Linkurious be able to generate these graphics?
- is the code available anywhere so someone else could try to use a graph database and maybe run other types of visualizations?
Original title and link: Social Network Analysis of Apache CloudStack ( ©myNoSQL)