Java: All content tagged as Java in NoSQL databases and polyglot persistence
Monday, 11 February 2013
Writing Hive UDFs With Java - a Tutorial
Alexander Dean’s tutorial published in SDJ:
In this article you will learn how to write a user-defined function (“UDF”) to work with the Apache Hive platform. We will start gently with an introduction to Hive, then move on to developing the UDF and writing tests for it. We will write our UDF in Java, but use Scala’s SBT as our build tool and write our tests in Scala with Specs2.
As far as I know it’s quite easy to write UDFs for Pig and Hive in any language that has a JVM implementation (Python with Jython, Ruby with JRuby, Groovy).
Original title and link: Writing Hive UDFs With Java - a Tutorial (©myNoSQL)
via: http://snowplowanalytics.com/blog/2013/02/08/writing-hive-udfs-and-serdes/
Thursday, 10 January 2013
How to Create Couchbase Views From Java
Couchbase 2.0 added support for CouchDB-like views and upgraded the client libraries to support this new feature. Tugdual Grall1 demos defining and querying Couchbase views from Java.
-
Tugdual Grall is Technical Evangelist at Couchbase ↩
Original title and link: How to Create Couchbase Views From Java (©myNoSQL)
Thursday, 3 January 2013
What Is the Spring Data Project?
Short answer: another sign that the Spring framework wants to do everything everywhere. A mammoth1.
Version 1.0 was released in 2004 as a lightweight alternative to Enterprise Java Beans (EJB). Since, then Spring has expanded into many other areas of enterprise development, such as enterprise integration (Spring Integration), batch processing (Spring Batch), web development (Spring MVC, Spring Webflow), security (Spring Security). Spring continues to push the envelope for mobile applications (Spring Mobile), social media (Spring Social), rich web applications (Spring MVC, s2js Javascript libraries), and NoSQL data access(Spring Data).
[…]
The complete pipeline can be implemented using Spring for Apache Hadoop along with Spring Integration and Spring Batch. However, Hadoop has its own set of challenges which the Spring for Apache Hadoop project is designed to address. Like all Spring projects, it leverages the Spring Framework to provide a consistent structure and simplify writing Hadoop applications. For example, Hadoop applications rely heavily on command shell tools. So applications end up being a hodge-podge of Perl, Python, Ruby, and bash scripts. Spring for Apache Hadoop, provides a dedicated XML namespace for configuring Hadoop jobs with embedded scripting features and support for Hive and Pig.
-
There’s a business reason for doing this though: when you have tons of clients you want to make sure they don’t have a chance to step outside. Is this new year resolution a heresy : I plan to use vastly less Spring this year? ↩
Original title and link: What Is the Spring Data Project? (©myNoSQL)
via: http://www.odbms.org/blog/2013/01/the-spring-data-project-interview-with-david-turanski/
Monday, 23 July 2012
Cascalog and Cascading: Productivity Solutions for Data Scientists
A good explanation of why Cascading, Cascalog, and other frameworks hiding away the details of MapReduce are making things easier for non-programmers:
Data scientists at The Climate Corporation chose to create their algorithms in Cascalog, which is a high-level Clojure-based machine learning language built on Cascading. Cascading is an advanced Java application framework that abstracts the MapReduce APIs in Apache Hadoop and provides developers with a simplified way to create powerful data processing workflows. Programming in Cascalog, data scientists create compact expressions that represent complex batch-oriented AI and machine learning workflows. This results in improved productivity for the data scientists, many of whom are mathematicians rather than computer scientists. It also gives them the ability to quickly analyze complex data sets without having to create large complicated programs in MapReduce. Furthermore, programmers at The Climate Corporation also use Cascading directly for creating jobs inside Hadoop streaming to process additional batch-oriented data workflows.
Original title and link: Cascalog and Cascading: Productivity Solutions for Data Scientists (©myNoSQL)
via: http://www.concurrentinc.com/case-studies/climate-corp/
Tuesday, 8 May 2012
The Future of NoSQL With Java EE
Markus Eisele:
We already have a lot in place for the so-called “NoSQL” DBs. And the groundwork for integrating this into new Java EE standards is promising. Control of embedded NoSQL instances should be done via JSR 322 (Java EE Connector Architecture) with this being the only allowed place spawn threads and open files directly from a filesystem. I’m not a big supporter of having a more general data abstraction JSR for the platform comparable to what Spring is doing with Spring Data. To me the concepts of the different NoSQL categories are too different than to have a one-size-fits-all approach.
Eureka!
Original title and link: The Future of NoSQL With Java EE (©myNoSQL)
via: http://blog.eisele.net/2012/05/future-of-nosql-with-java-ee.html
Friday, 6 April 2012
DynamoDB Libraries, Mappers, and Mock Implementations
A list of DynamoDB libraries covering quite a few popular languages and frameworks:

A couple of things I’ve noticed (and that could be helpful to other NoSQL database companies):
- Amazon provides official libraries for a couple of major programming languages (Java, .NET, PHP, Ruby)
- Amazon is not shy to promote libraries that are not official, but established themselves as good libraries (e.g. Python’s Boto)
- The list doesn’t seem to include anything for C and Objective C (Objective C is the language of iOS and Mac apps)
Original title and link: DynamoDB Libraries, Mappers, and Mock Implementations (©myNoSQL)
Tuesday, 3 April 2012
NoSQL Databases MongoDB and Oracle NoSQL Support Added to EclipseLink JPA
EclipseLink 2.4 will support JPA access to NoSQL databases. This support is already part of the EclipseLink development trunk and can be tried out using the milestone or nightly builds. Initial support is provided for MongoDB and Oracle NoSQL. A plug-able platform and adapter layer allows for other databases to be supported.
I know the intentions are good, but JPA for NoSQL doesn’t make too much sense to me: object-relational mapping applied to non-relational data models.
Original title and link: NoSQL Databases MongoDB and Oracle NoSQL Support Added to EclipseLink JPA (©myNoSQL)
via: http://java-persistence-performance.blogspot.com/2012/04/eclipselink-jpa-supports-mongodb.html
Friday, 30 March 2012
Integrating VoltDB With the Spring Framework
There are two Java clients for VoltDB. One is a standard JDBC driver that executes all queries synchronously. The other is a specialized client library that can run queries either synchronously or asynchronously, along with a number of other features. Synchronous queries perform well enough but their throughput is no match for asynchronous queries. Asynchronous query throughput is approximately four times greater than synchronous queries in a two node VoltDB cluster. For example, an application using asynchronous queries can run over 200K TPS (transactions per second) in a two node server cluster using a single client running on a Macbook Pro; a synchronous client running the same queries will achieve around 56K TPS.
Could anyone explain what leads to such a difference in performance?
Original title and link: Integrating VoltDB With the Spring Framework (©myNoSQL)
via: http://voltdb.com/company/blog/integrating-voltdb-spring-framework
Monday, 26 March 2012
Spring Pet Clinic Goes Grails and Sharded on MongoDB With Cloudify
If you’re a Java programmer you must have heard of the Spring sample app Pet Clinic. To showcase Cloudify, Gigaspaces guys migrated the Pet Clinic to Grails and used MongoDB to shard it:

Original title and link: Spring Pet Clinic Goes Grails and Sharded on MongoDB With Cloudify (©myNoSQL)
via: http://www.cloudifysource.org/2012/03/25/petclinic_deepdive.html
Monday, 5 March 2012
MapReduce and Massively Parallel Processing (MPP): Two Sides of the Big Data
Andrew Brust for ZDNet:
But, for a variety of reasons, MPP and MapReduce are used in rather different scenarios. You will find MPP employed in high-end data warehousing appliances. […] MPP gets used on expensive, specialized hardware tuned for CPU, storage and network performance. MapReduce and Hadoop find themselves deployed to clusters of commodity servers that in turn use commodity disks. The commodity nature of typical Hadoop hardware (and the free nature of Hadoop software) means that clusters can grow as data volumes do, whereas MPP products are bound by the cost of, and finite hardware in, the appliance and the relative high cost of the software. […] MPP and MapReduce are separated by more than just hardware. MapReduce’s native control mechanism is Java code (to implement the Map and Reduce logic), whereas MPP products are queried with SQL (Structured Query Language). […] Nonetheless, Hadoop is natively controlled through imperative code while MPP appliances are queried though declarative query. In a great many cases, SQL is easier and more productive than is writing MapReduce jobs, and database professionals with the SQL skill set are more plentiful and less costly than Hadoop specialists.
I totally agree with Andrew Brust that none of these are good reasons for these platforms to remain separate. Actually when analyzing the importance of the Teradata (MPP) and Hortonworks (Hadoop) partnership, I wrote:
Depending on the level of integration the two team will pull together, this partnership might result in one of the most complete and powerful structured and unstructured data warehouse and analytics platform.
This very same thing could be said about any platform that would offer a viable, fully integrated, cost effective, distributed, structured and unstructured data warehouse or analytics platform. MPP and MapReduce do not represent different sides of the Big Data, but rather complementary approaches for Big Data.
Original title and link: MapReduce and Massively Paralle Processing (MPP): Two Sides of the Big Data (©myNoSQL)
via: http://www.zdnet.com/blog/big-data/mapreduce-and-mpp-two-sides-of-the-big-data-coin/121
Thursday, 2 February 2012
NoSQL tutorials: Storing User Preference in Amazon DynamoDB using the Mobile SDKs
Just a CRUD tutorial for DynamoDB but based on a scenario that makes sense and demoing the API with two languages (Objective-C and Java):
The sample mobile application described here demonstrates how to store user preferences in Amazon DynamoDB. Because more and more people are using multiple mobile devices, connecting these devices to the cloud and storing user preferences in the cloud enables developers to provide a more uniform cross-device experience for their users.
This article shows sample code for both the iOS and Android platforms.
Original title and link: NoSQL tutorials: Storing User Preference in Amazon DynamoDB using the Mobile SDKs (©myNoSQL)
Tuesday, 20 December 2011
Neo4j Gets Experimental JDBC Driver
Neo4j getting a JDBC driver before MongoDB is a surprise[1]. Rickard Öberg:
When it comes to NOSQL databases, one of the key advantages is that they allow you to structure your data in a way that better resembles your domain, and also allows you to use query languages where you can express things that are either really awkward or slow with SQL. However, one of the advantages that relational databases have is that they can be accessed from lots of tools using JDBC, as a standard API. So what would happen if a NOSQL database, like Neo4j, also had a JDBC driver? I decided to find out!
If this catches up, the next step is adding a non-HTTP protocol to Neo4j server.
-
MongoDB is the NoSQL database with the richest querying model which resembles SQL. ↩
Original title and link: Neo4j Gets Experimental JDBC Driver (©myNoSQL)
via: http://rickardoberg.wordpress.com/2011/12/19/creating-a-jdbc-driver-for-neo4j/
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling