ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Putting your NoSQL data to work

The fact that you are storing your data into a NoSQL solution, doesn’t mean that you are done with it. You’ll still have to put it to work, transform and move it, or do some data warehousing[1]. And the lack of SQL should not stop you for doing any of these.

One solution available in many NoSQL stores is MapReduce — as an example you can see how you can translate SQL to MongoDB MapReduce.

But MapReduce is not the only option available and I’d like to quickly introduce you to a couple of alternative solutions.

HBase-dsl

Working with HBase may be at times quite verbose and while Java is not very good at creating DSLs sometimes even a more fluent APIs are useful. This is exactly what HBase-dsl brings you:

However I found myself writing tons of code to perform some fairly simple tasks. So I set out to simply my HBase code and ended up writing a Java HBase DSL. It’s still fairly rough around the edges but it does allow the use of standard Java types and it’s extensible.

hBase.save("test"). 
    row("abcd").
        
    family("famA").
       
    col("col1", "hello world!");

String value = hBase.fetch("test").

    row("abcd").
    family("famA").

    value("col1", String.class);

HBql

HBql goals is to bring, to those missing SQL, a more SQLish interface to HBase. You can take a look at ☞ HBql statements to get a better feeling of what it looks like.

Hive

Hive is a data warehouse infrastructure for Hadoop that proposes a SQL-like query language to enable easy data ETL.

Pig

Pig is a platform for analyzing large data sets built on Hadoop. I have found a great article ☞ comparing Pig Latin over Hadoop to SQL over a relational database

  1. Pig Latin is procedural, where SQL is declarative.
  2. Pig Latin allows pipeline developers to decide where to checkpoint data in the pipeline.
  3. Pig Latin allows the developer to select specific operator implementations directly rather than relying on the optimizer.
  4. Pig Latin supports splits in the pipeline.
  5. Pig Latin allows developers to insert their own code almost anywhere in the data pipeline.

But don’t think that the HBase and Hadoop are the only one getting such tools. In the graph databases world, there is Gremlin ☞: a graph-based programming language meant to ease graph query, analysis, and manipulation.

I think sooner than later we will see more such solutions appearing in the NoSQL environment.

References