R: All content tagged as R in NoSQL databases and polyglot persistence
Wednesday, 9 April 2014
7 quick facts about R
Based on a slide deck by David Smith:
 R is the highest paid IT skill (Dice.com survey, January 2014)
 R is the mostused data science language after SQL (O’Reilly survey, January 2014)
 R is used by 70% of data miners (Rexer survey, October 2013)
 R is #15 of all programming languages (RedMonk language rankings, January 2014)
 R is growing faster than any other data science language (KDNuggets survey, August 2013)
 R is the #1 Google Search for Advanced Analytics software (Google Trends, March 2014)
 R has more than 2 million users worldwide (Oracle estimate, February 2012)
I can see a couple of actionable items based on this list:
 if you’re interested in data science, you should consider R
 if you are already using R, ask for a raise
Original title and link: 7 quick facts about R ( ©myNoSQL)
Monday, 3 February 2014
Visualizing RunKeeper data in R
In Academic torrents: Almost 1.7TB of research data available, I complained about the lack of interesting open data. Dan Goldin’s Visualizing RunKeeper data in R is a good example of what I mean. While learning R, he used his own data about his running results. That made it both interesting and fun.
What better way to celebrate running 1000 miles in 2013 than dumping the data into R and generating some visualizations? It’s also a step in my quest to replace Excel with R.
I hope no one will argue that this is a more exciting experience than learning a new technology while using the Enron email archive.
Original title and link: Visualizing RunKeeper data in R ( ©myNoSQL)
Friday, 27 December 2013
Data Science Wars: Python vs. R
Daniel Gutierrez posted a pretty good summary of the recent discussions about the preferred or most productive or most used data processing environments (R or Python):
While R has traditionally been the programming language of choice for data scientists, some believe it is ceding ground to Python. Here is a short list of some the arguments I’ve heard of late, along with my personal assessment of each…
The summary of a summary is that this conversation can be reduced to familiarity vs highly specialized algorithms^{1}.

While Python can get many of the specialized tools available in R, R has a lot more work to do to become a familiar environment for devs. ↩
Original title and link: Data Science Wars: Python vs. R ( ©myNoSQL)
via: http://insidebigdata.com/2013/12/09/datasciencewarspythonvsr/
Tuesday, 10 December 2013
Integrating R with Cloudera Impala for RealTime queries on Hadoop
A very long tutorial by Istvan Szegedi on how to integrate R with Cloudera Impala, through the ODBC driver:
Cloudera Impala is an exciting new technology to provide realtime, interactive queries in Hadoop environment. It supports ODBC connectors and this makes it possible to integrate it with many popular BI tools and statistical software such as R. Together R and Impala provide an excellent combination for data analyst to process massive data sets efficiently and they can also support graphical representation of the result sets.
Original title and link: Integrating R with Cloudera Impala for RealTime queries on Hadoop ( ©myNoSQL)
Sunday, 2 September 2012
Running R on Hadoop: Why MapReduce? Why R?
If you find a good way to put together two things that excel at what they are doing, you’ll most probably get a gold nugget. That’s what I feel when thinking about integrating R and Hadoop. Jeffrey Breen’s slides seem to agree:
Wednesday, 6 June 2012
R Flavored Markdown
I couldn’t resist:
R Flavored Markdown is a plaintext formatting syntax for creating documents that can be rendered to HTML. In fact it’s like HTML, but simpler. R Flavored Markdown is a variant of original Markdown with a few additional features:
 Github Flavored Markdown (GFM) which supports source code blocks,
 Sundown Markdown which implements GFM but contains additional extensions like support for tables and automatic substitution for typographical characters, and
 Embedded Math Equations with MathJax (think latex).
Original title and link: R Flavored Markdown ( ©myNoSQL)
via: http://jeffreyhorner.tumblr.com/post/24404112057/announcingthermarkdownpackage
Monday, 28 May 2012
13 R Online Resources for Big Data and Parallel Computing
A list of articles, papers, and tutorials for R put together by Yanchang Zhao.
Original title and link: 13 R Online Resources for Big Data and Parallel Computing ( ©myNoSQL)
Thursday, 24 May 2012
Using R With Cassandra Through JDBC or Hive
A short post by Jake Luciani listing 2 R modules—RJDBC module and RCassandra—that enable using R with Cassandra through either the JDBC or Hive drivers.
This is a good example of what I meant by designing products with openness and integration in mind.
Original title and link: Using R With Cassandra Through JDBC or Hive ( ©myNoSQL)
via: http://www.datastax.com/dev/blog/biganalyticswithrcassandraandhive
Thursday, 23 February 2012
Data Scientist’s Anthem
Data Scientist’s anthem  We R Who We R
Original title and link: Data Scientist’s Anthem ( ©myNoSQL)
Wednesday, 8 February 2012
Hadoop, HBase and R: Will Open Source Software Challenge BI & Analytics Software Vendors?
Harish Kotadia:
Predictive Analytics has been billed as the next big thing for almost fifteen years, but hasn’t gained mass acceptance so far the way ERP and CRM solutions have. One of the main reason for this is the high upfront investment required in Software, Hardware and Talent for implementing a Predictive Analytics solution.
Well, this is about to change – […] Using R, HBase and Hadoop, it is possible to build costeffective and scalable Big Data Analytics solutions that match or even exceed the functionality offered by costly proprietary solutions from leading BI/Analytics software vendors at a fraction of the cost.
Vendors will argue that software licensing represents just a small fraction of the costs of implementing BI or data analytics. What they’ll leave out is the costs of acquiring knowhow and more important, the costs of maintenance and modernization of their solutions.
Original title and link: Hadoop, HBase and R: Will Open Source Software Challenge BI & Analytics Software Vendors? ( ©myNoSQL)
Monday, 6 February 2012
Calculating a Graph's Degree Distribution Using R MapReduce over Hadoop
Marko Rodriguez is experimenting with R on Hadoop and one of his exercises is calculating a graph’s degree distribution. I confess I had to use Wikipedia for reminding what’s the definition of a node degree:
 The degree of a node in a network (sometimes referred to incorrectly as the connectivity) is the number of connections or edges the node has to other nodes. The degree distribution P(k) of a network is then defined to be the fraction of nodes in the network with degree k.
 The degree distribution is very important in studying both real networks, such as the Internet and social networks, and theoretical networks.
As an imagination exercise think of a graph database that’s actively maintaining an internal degree distribution and uses it to suggest or partition the graph. Would that work?
Original title and link: Calculating a Graph’s Degree Distribution Using R MapReduce over Hadoop ( ©myNoSQL)
via: http://groups.google.com/group/gremlinusers/browse_thread/thread/db50a72f92a26e06
Tuesday, 20 December 2011
Call to Arms: Renjin, R Implementation on JVM Needs Contributions
Until yesterday I didn’t know there’s an attempt to implement the R language on the JVM. But there’s one: renjin. And it sounds like it needs some helping hands to accomplish its goal of reaching a 1.0 release in 2012.
In case you’d wonder why R on the JVM—same question have been asked so many times related to JRuby, Jython, etc—just think of:
 it would allow access to the tons of Java libraries
 it would integrate seamlessly with tools like Hadoop
If you are ready to start contributing head on to the Renjin’s plan of attack for 2012 page and learn where your help would be needed.
Original title and link: Call to Arms: Renjin, R Implementation on JVM Needs Contributions ( ©myNoSQL)
Most Popular Articles
 Translate SQL to MongoDB MapReduce
 Tutorial: Getting Started With Cassandra
 CouchDB vs MongoDB: An attempt for a More Informed Comparison
 Cassandra @ Twitter: An Interview with Ryan King
 A Couple of Nice GUI Tools for MongoDB
 NoSQL benchmarks and performance evaluations
 Ehcache: Distributed Cache or NoSQL Store?
 Document Databases Compared: CouchDB, MongoDB, RavenDB
 Quick Review of Existing Graph Databases
 NoSQL Data Modeling