ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

r: All content tagged as r in NoSQL databases and polyglot persistence

R: What Is and How Can It Help?

From Loraine Lawson interview with Jeff Erhardt[1].

What is R?

R is an open source statistical programming language. The easiest way to think about it is the largest commercial competitor in the states is a company called SAS, and while it’s not a perfect analogy, one way to think about R is as an open source version of SAS. It’s not perfectly correct, but for people who have not heard of R, that’s one way to explain it.

Where can R help?

  • analyzing and gaining meaning from collected data
  • developing models and extracting the insight from data
  • implementing these analytics within an enterprise and disseminating the knowledge across the enterprise

Now are you ready to bet what will be the data processing platform of tomorrow?


  1. Jeff Erhardt: COO of Revolution Analytics, the company offering products and services for R  

Original title and link: R: What Is and How Can It Help? (NoSQL databases © myNoSQL)

via: http://www.itbusinessedge.com/cm/community/features/interviews/blog/the-power-of-r-more-companies-using-language-for-business-analytics/?cs=46325


The Data Processing Platform for Tomorrow

In the blue corner we have IBM with Netezza as analytic database, Cognos for BI, and SPSS for predictive analytics. In the green corner we have EMC with Greenplum and the partnership with SAS[1]. And in the open source corner we have Hadoop and R.

Update: there’s also another corner I don’t know how to color where Teradata and its recently acquired Aster Data partner with SAS.

Who is ready to bet on which of these platforms will be processing more data in the next years?


  1. GigaOm has a good article on this subject here  

Original title and link: The Data Processing Platform for Tomorrow (NoSQL databases © myNoSQL)


RStudio: The Free and Open R IDE

For data scientists and not only: RStudio, the R IDE that runs on all major platforms or alongside R on a server and being accessible through a browser. Free and Open.

RStudio

Original title and link: RStudio: The Free and Open R IDE (NoSQL databases © myNoSQL)


R and the web in 2011

The last couple of posts were about BigData and Jeffrey Horner’s presentation is inline with this topic:

If there is ever a time to learn R and web application development, it is now…in the age of Big Data. The upcoming release of R 2.13 will provide basic functionality for developing R web applications on the desktop via the internal HTTP server, but the interface is incompatible with rApache. Jeffrey will talk about Rack, a web server interface and package for R, and how you can start creating your own Big Data stories from the comfort of your own desktop.

Note: The video is missing the beginning and it is not a generic talk about R, so it will be interesting mostly to those using R and planning to develop web applications directly from R.

Original title and link: R and the web in 2011 (NoSQL databases © myNoSQL)


Names You Need to Know in 2011: R Data Analysis Software

Steve McNally (Forbes):

Simply put by one of its staunchest advocates, “R is the most powerful statistical computing language on the planet; there is no statistical equation that cannot be calculated in R.”

If you say data scientists or Big Data, then you are saying Hadoop and R.

Original title and link: Names You Need to Know in 2011: R Data Analysis Software (NoSQL databases © myNoSQL)

via: http://blogs.forbes.com/smcnally/2010/11/10/names-you-need-to-know-in-2011-r-data-analysis-software/


MongoDB and R using Java

It would be nice if there were an R package, along the lines of RMySQL, for MongoDB. For now there is not — so, how best to get data from a MongoDB database into R

I told you before about R and “data addicts”

Original title and link: MongoDB and R using Java (NoSQL databases © myNoSQL)

via: http://nsaunders.wordpress.com/2010/09/24/connecting-to-a-mongodb-database-from-r-using-java/


CouchDB and R

Here are some quick crib notes on getting R talking to CouchDB using Couch’s ReSTful HTTP API. We’ll do it in two different ways. First, we’ll construct HTTP calls with RCurl, then move on to the R4CouchDB package for a higher level interface.

R is the favorite tool for data addicts.

Original title and link: CouchDB and R (NoSQL databases © myNoSQL)

via: http://digitheadslabnotebook.blogspot.com/2010/10/couchdb-and-r.html


RevoScaleR: R for BigData

Dave Rosenberg:

the new package will allow users to process, visualize, and model terabyte-class data sets in a matter of seconds, and it leverages many popular data processors and storage mechanisms, including the popular Apache Hadoop framework and countless NoSQL databases, for complex statistical analysis.

The RevoScaleR package introduces a number of new features, including:

  • a new binary ‘Big Data’ file format—XDF—with an interface to the R language that provides high-speed access to arbitrary rows, blocks, and columns of data
  • a collection of the most common statistical algorithms optimized for big data, including high-performance implementations of summary statistics, linear regression, binomial logistic regression, and crosstabs
  • data reading and transformation tools to prepare large data sets for analysis

Possibly another useful tool in the Pig, Cascalog toolbox.

via: http://news.cnet.com/8301-13846_3-20012446-62.html