r: All content tagged as r in NoSQL databases and polyglot persistence
Until yesterday I didn’t know there’s an attempt to implement the R language on the JVM. But there’s one: renjin. And it sounds like it needs some helping hands to accomplish its goal of reaching a 1.0 release in 2012.
In case you’d wonder why R on the JVM—same question have been asked so many times related to JRuby, Jython, etc—just think of:
- it would allow access to the tons of Java libraries
- it would integrate seamlessly with tools like Hadoop
If you are ready to start contributing head on to the Renjin’s plan of attack for 2012 page and learn where your help would be needed.
Original title and link: Call to Arms: Renjin, R Implementation on JVM Needs Contributions ( ©myNoSQL)
Revolution Analytics, the commercial provider of the leading statistics language for advanced analytics as showed also by this data analysis tools survey among data scientist has released Revolution R Enterprise 5.0 featuring:
- Distributed/Parallel Computing: Automatically distribute statistical analyses from a desktop across nodes of a cluster through Windows HPC server and distribute R function calls across nodes.
- Scalable Data Management: Increase flexibility in data analysis with new data import and cleaning/manipulation tools.
- Integration with Hadoop: Support MapReduce programming in R and integration with HDFS and HBASE with Cloudera Certified Technology
- Expanded Scalable Analytics Functionality: Apply new big data statistics algorithms including principal components analysis, factor analysis, contingency table analysis and more.
- Enhanced R Productivity Environment: Create and build R packages with expanded support features.
- Enhanced RevoDeployR server: Add multiple compute nodes to support more users, batch execution of large analysis jobs, and LDAP enterprise security support.
- Upgraded Open Source R: Revolution R 5.0 includes the fully-patched R 2.13.2, which features a new byte-compiler to improve performance of user-written functions and packages.
If you are not familiar with R, check this brief description of what is R and how can it help.
Original title and link: Revolution R Enterprise 5.0 Released ( ©myNoSQL)
In the series of big announcements coming out this month, Cloudera and Revolution Analytics, the enterprise provider of R software, have announced their partnership to integrate Cloudera’s Hadoop distribution with Revolution R Enterprise platform thus offering R developers direct access to Hadoop data stores and the possibility to write MapReduce jobs directly in R.
The integration packages, named RevoConnectR for Apache Hadoop, are already available freely on GitHub and they will also get commercial support with Revolution R Enterprise 5.0 Server for Linux.
You can read more about this announcement on:
Original title and link: R and Hadoop: Revolution Analytics and Cloudera Partnership Announced ( ©myNoSQL)
Over two million analysts worldwide use R, and they come from an extremely diverse pool of industries that ranges from journalism to financial services to life sciences.
If you replace R with data analytics, this could seen as a very appealing future of Big Data and data analytics. Something like a generalized version of data analytics at work.
But before loosing myself in this perspective, I thought I should take a look at the present and see how what is done now is going to lead to that amazing tomorrow:
- Tim O’Reilly said a couple of years ago “Data is the Intel inside” and since then we’re seeing lots and lots of companies trying to materialize this slogan.
- More new technologies for storage, processing, and analysis are developed and reaching the market then in the 10 previous years.
- People are starting to embrace big data overcoming their fear of privacy invasion
All these are good signs that we could consider as a good basis for the future. On the other hand the past and today’s reality tell a different story:
- Even if technology costs decreased over time, the investment in creating data startups are still high.
- Financial institutions are not investing (too much) into data technology companies.
- There are only a few companies that are able to accumulate significant amounts of useful data.
- There are even fewer companies that are able to use effectively the huge amounts of data.
What worries me is that even if we will continue to see both a commoditization and impressive improvement of data solutions, by the time all tools will be in place and accessible to everyone, as per the opening paragraph, really valuable data will reside in just a few private well locked silos.
Original title and link: The Appealing Future of Big Data and Data Analytics ( ©myNoSQL)
In the blue corner we have IBM with Netezza as analytic database, Cognos for BI, and SPSS for predictive analytics. In the green corner we have EMC with Greenplum and the partnership with SAS. And in the open source corner we have Hadoop and R.
Update: there’s also another corner I don’t know how to color where Teradata and its recently acquired Aster Data partner with SAS.
Who is ready to bet on which of these platforms will be processing more data in the next years?
The last couple of posts were about BigData and Jeffrey Horner’s presentation is inline with this topic:
If there is ever a time to learn R and web application development, it is now…in the age of Big Data. The upcoming release of R 2.13 will provide basic functionality for developing R web applications on the desktop via the internal HTTP server, but the interface is incompatible with rApache. Jeffrey will talk about Rack, a web server interface and package for R, and how you can start creating your own Big Data stories from the comfort of your own desktop.
Note: The video is missing the beginning and it is not a generic talk about R, so it will be interesting mostly to those using R and planning to develop web applications directly from R.