If you find a good way to put together two things that excel at what they are doing, you’ll most probably get a gold nugget. That’s what I feel when thinking about integrating R and Hadoop. Jeffrey Breen’s slides seem to agree:
Continue to read ➤
Marko Rodriguez is experimenting with R on Hadoop and one of his exercises is calculating a graph’s degree distribution. I confess I had to use Wikipedia for reminding what’s the definition of a node degree:
- The degree of a node in a network (sometimes referred to incorrectly as the connectivity) is the number of connections or edges the node has to other nodes. The degree distribution P(k) of a network is then defined to be the fraction of nodes in the network with degree k.
- The degree distribution is very important in studying both real networks, such as the Internet and social networks, and theoretical networks.
As an imagination exercise think of a graph database that’s actively maintaining an internal degree distribution and uses it to suggest or partition the graph. Would that work?
Original title and link: Calculating a Graph’s Degree Distribution Using R MapReduce over Hadoop ( ©myNoSQL)