ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

R: All content tagged as R in NoSQL databases and polyglot persistence

Running R on Hadoop: Why MapReduce? Why R?

If you find a good way to put together two things that excel at what they are doing, you’ll most probably get a gold nugget. That’s what I feel when thinking about integrating R and Hadoop. Jeffrey Breen’s slides seem to agree:


R Flavored Markdown

I couldn’t resist:

R Flavored Markdown is a plain-text formatting syntax for creating documents that can be rendered to HTML. In fact it’s like HTML, but simpler. R Flavored Markdown is a variant of original Markdown with a few additional features:

  • Github Flavored Markdown (GFM) which supports source code blocks,
  • Sundown Markdown which implements GFM but contains additional extensions like support for tables and automatic substitution for typographical characters, and
  • Embedded Math Equations with MathJax (think latex).

Example input and output.

Original title and link: R Flavored Markdown (NoSQL database©myNoSQL)

via: http://jeffreyhorner.tumblr.com/post/24404112057/announcing-the-r-markdown-package


13 R Online Resources for Big Data and Parallel Computing

A list of articles, papers, and tutorials for R put together by Yanchang Zhao.

Original title and link: 13 R Online Resources for Big Data and Parallel Computing (NoSQL database©myNoSQL)

via: http://rdatamining.wordpress.com/2012/05/06/online-resources-for-handling-big-data-and-parallel-computing-in-r/


Using R With Cassandra Through JDBC or Hive

A short post by Jake Luciani listing 2 R modules—RJDBC module and RCassandra—that enable using R with Cassandra through either the JDBC or Hive drivers.

This is a good example of what I meant by designing products with openness and integration in mind.

Original title and link: Using R With Cassandra Through JDBC or Hive (NoSQL database©myNoSQL)

via: http://www.datastax.com/dev/blog/big-analytics-with-r-cassandra-and-hive


Data Scientist’s Anthem

Shamir Karkal:

Data Scientist’s anthem - We R Who We R

Andrei Savu

Original title and link: Data Scientist’s Anthem (NoSQL database©myNoSQL)


Hadoop, HBase and R: Will Open Source Software Challenge BI & Analytics Software Vendors?

Harish Kotadia:

Predictive Analytics has been billed as the next big thing for almost fifteen years, but hasn’t gained mass acceptance so far the way ERP and CRM solutions have. One of the main reason for this is the high upfront investment required in Software, Hardware and Talent for implementing a Predictive Analytics solution.

Well, this is about to change – […] Using R, HBase and Hadoop, it is possible to build cost-effective and scalable Big Data Analytics solutions that match or even exceed the functionality offered by costly proprietary solutions from leading BI/Analytics software vendors at a fraction of the cost.

Vendors will argue that software licensing represents just a small fraction of the costs of implementing BI or data analytics. What they’ll leave out is the costs of acquiring know-how and more important, the costs of maintenance and modernization of their solutions.

Original title and link: Hadoop, HBase and R: Will Open Source Software Challenge BI & Analytics Software Vendors? (NoSQL database©myNoSQL)

via: http://smartdatacollective.com/hkotadia1/45540/big-data-will-open-source-software-challenge-bi-analytics-software-vendors


Calculating a Graph's Degree Distribution Using R MapReduce over Hadoop

Marko Rodriguez is experimenting with R on Hadoop and one of his exercises is calculating a graph’s degree distribution. I confess I had to use Wikipedia for reminding what’s the definition of a node degree:

  1. The degree of a node in a network (sometimes referred to incorrectly as the connectivity) is the number of connections or edges the node has to other nodes. The degree distribution P(k) of a network is then defined to be the fraction of nodes in the network with degree k.
  2. The degree distribution is very important in studying both real networks, such as the Internet and social networks, and theoretical networks.

As an imagination exercise think of a graph database that’s actively maintaining an internal degree distribution and uses it to suggest or partition the graph. Would that work?

Original title and link: Calculating a Graph’s Degree Distribution Using R MapReduce over Hadoop (NoSQL database©myNoSQL)

via: http://groups.google.com/group/gremlin-users/browse_thread/thread/db50a72f92a26e06


Call to Arms: Renjin, R Implementation on JVM Needs Contributions

Until yesterday I didn’t know there’s an attempt to implement the R language on the JVM. But there’s one: renjin. And it sounds like it needs some helping hands to accomplish its goal of reaching a 1.0 release in 2012.

In case you’d wonder why R on the JVM—same question have been asked so many times related to JRuby, Jython, etc—just think of:

  • it would allow access to the tons of Java libraries
  • it would integrate seamlessly with tools like Hadoop

If you are ready to start contributing head on to the Renjin’s plan of attack for 2012 page and learn where your help would be needed.

Original title and link: Call to Arms: Renjin, R Implementation on JVM Needs Contributions (NoSQL database©myNoSQL)


Revolution R Enterprise 5.0 Released

Revolution Analytics, the commercial provider of the leading statistics language for advanced analytics as showed also by this data analysis tools survey among data scientist has released Revolution R Enterprise 5.0 featuring:

  • Distributed/Parallel Computing: Automatically distribute statistical analyses from a desktop across nodes of a cluster through Windows HPC server and distribute R function calls across nodes.
  • Scalable Data Management: Increase flexibility in data analysis with new data import and cleaning/manipulation tools.
  • Integration with Hadoop: Support MapReduce programming in R and integration with HDFS and HBASE with Cloudera Certified Technology
  • Expanded Scalable Analytics Functionality: Apply new big data statistics algorithms including principal components analysis, factor analysis, contingency table analysis and more.
  • Enhanced R Productivity Environment: Create and build R packages with expanded support features.
  • Enhanced RevoDeployR server: Add multiple compute nodes to support more users, batch execution of large analysis jobs, and LDAP enterprise security support.
  • Upgraded Open Source R: Revolution R 5.0 includes the fully-patched R 2.13.2, which features a new byte-compiler to improve performance of user-written functions and packages.

If you are not familiar with R, check this brief description of what is R and how can it help.

Original title and link: Revolution R Enterprise 5.0 Released (NoSQL database©myNoSQL)


R: the Leading Statistics Language and Key Weapon in Advanced Analytics Today

David Smith (Revolution Analytics):

Of course, this isn’t the first time that R has been embedded into a data warehousing appliance. IBM Netezza’s iClass device integrates with Revolution R, and AsterData, the Teradata Data Warehouse Appliance, and Greenplum all provide connections to R as well. Here at Revolution Analytics, we think that such enterprise-level integrations with R serve to grow the R ecosystem and serve as validation of R as a key platform for advanced analytics. As CEO Norman Nie said to GigaOm this weekend, 

“Oracle’s announcement to embed R demonstrates validation for the leading statistics language and offers further evidence that R is a key weapon in advanced analytics today”

And let’s not leave aside the strategic partnership between Revolution Analytics and Cloudera to include RevoConnectR in the CDH.

Original title and link: R: the Leading Statistics Language and Key Weapon in Advanced Analytics Today (NoSQL database©myNoSQL)

via: http://www.r-bloggers.com/oracles-big-data-appliance-to-include-r/


R and Hadoop: Revolution Analytics and Cloudera Partnership Announced

In the series of big announcements coming out this month, Cloudera and Revolution Analytics, the enterprise provider of R software, have announced their partnership to integrate Cloudera’s Hadoop distribution with Revolution R Enterprise platform thus offering R developers direct access to Hadoop data stores and the possibility to write MapReduce jobs directly in R.

The integration packages, named RevoConnectR for Apache Hadoop, are already available freely on GitHub and they will also get commercial support with Revolution R Enterprise 5.0 Server for Linux.

You can read more about this announcement on:

Original title and link: R and Hadoop: Revolution Analytics and Cloudera Partnership Announced (NoSQL database©myNoSQL)


The Appealing Future of Big Data and Data Analytics

In a RWW article, David Smith writes about the R statistics language:

Over two million analysts worldwide use R, and they come from an extremely diverse pool of industries that ranges from journalism to financial services to life sciences.

If you replace R with data analytics, this could seen as a very appealing future of Big Data and data analytics. Something like a generalized version of data analytics at work.

But before loosing myself in this perspective, I thought I should take a look at the present and see how what is done now is going to lead to that amazing tomorrow:

  1. Tim O’Reilly said a couple of years ago “Data is the Intel inside” and since then we’re seeing lots and lots of companies trying to materialize this slogan.
  2. More new technologies for storage, processing, and analysis are developed and reaching the market then in the 10 previous years.
  3. People are starting to embrace big data overcoming their fear of privacy invasion

All these are good signs that we could consider as a good basis for the future. On the other hand the past and today’s reality tell a different story:

  1. Even if technology costs decreased over time, the investment in creating data startups are still high.
  2. Financial institutions are not investing (too much) into data technology companies.
  3. There are only a few companies that are able to accumulate significant amounts of useful data.
  4. There are even fewer companies that are able to use effectively the huge amounts of data.

What worries me is that even if we will continue to see both a commoditization and impressive improvement of data solutions, by the time all tools will be in place and accessible to everyone, as per the opening paragraph, really valuable data will reside in just a few private well locked silos.

Original title and link: The Appealing Future of Big Data and Data Analytics (NoSQL database©myNoSQL)