NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Enterprise-class NoSQL

What is distinctive about an enterprise-class NoSQL database is its support for additional enterprise-scale application requirements, namely: ACID (atomic, consistent, isolated, and durable) transactions, government-grade security and elasticity, as well as automatic failover.

What is distinctive about an enterprise-class NoSQL database is what my company is selling.

If that would be true, I doubt we would have no any other databases around considering MarkLogic’ age and perfect fit.

Snarky comments aside, the enterprise requirements are so complicated, numerous, political and sometime non-technical, that I don’t think anyone would ever be able to come up with a definition or (even if extremely long) checklist of what’s enterprise-grade.

Original title and link: Enterprise-class NoSQL (NoSQL database©myNoSQL)


What does comprehensive security mean for Hadoop?

Hortonworks and their new security team explain the current status and their plans for a “holistic and comprehensive” security solution for Hadoop:

A comprehensive security approach means that irrespective of how the data is stored and accessed, there should be an integrated framework for securing data. Enterprises may adopt any use case (batch, real time, interactive), but data should be secured through the same standards, and security should be administered centrally and in one place.

✚ If you have only a couple of seconds, focus on the diagram under the section “HDP + XA - Current offering” and skim over the following 4 sections: Authentication, Authorization, Auditing, Data protection


✚ It’s safe to assume this post was meant to introduce Hortonwork’s position to Hadoop security as compared to Cloudera’s (and their collaboration on security aspects with Intel):

Original title and link: What does comprehensive security mean for Hadoop? (NoSQL database©myNoSQL)


RethinkDB 1.13: new protocol and push-pull APIs

Some interesting changes and new features in RethinkDB 1.13 announced yesterday. Namely:

  • replacing the protocol buffers-based protocol for a JSON-protocol

    • how does the JSON protocol manage the non-JSON data types?
    • how fast is a text-based protocol?
  • notifications about document changes

    I’ve always said this was the coolest feature in CouchDB and that every database should support it.

  • a weird1 new http command to pull JSON data from the web

I’ve checked again the RethinkDB stability report and I’m not sure that reads as “yep, RethinkDB is finally production ready”.

  1. Knowing the team there, I’m pretty sure this is coming from a use case I’m not seeing. 

Original title and link: RethinkDB 1.13: new protocol and push-pull APIs (NoSQL database©myNoSQL)

The evolution of Cassandra

Robbie Strickland (Weather Channel’s software development manager):

“It used to be that Cassandra was something of a beast,” recalled Robbie Strickland, the Weather Channel’s software development manager. “It was like an old space shuttle cockpit with a million knobs.” In fact, prior to joining the Weather Channel, Strickland says using Cassandra required spending “most of the day in the Cassandra IRC channel talking to developers.”

Some databases evolve into more powerful and friendlier tools, while others are just “familiar giant hairballs growing bigger“.

Original title and link: The evolution of Cassandra (NoSQL database©myNoSQL)




We support the following databases to help you backup and restore your database in case of any failure: MySQL, MongoDB, Redis, PostgreSQL.

Sounds interesting, but I couldn’t find any details about how it works, how secure (as in encrypted) backups are, what’s the guaranteed redundancy, if there’s a SLA for restore, etc.

Original title and link: BackMyDB (NoSQL database©myNoSQL)


Project Rhino goal: at-rest encryption for Apache Hadoop

Although network encryption has been provided in the Apache Hadoop platform for some time (since Hadoop 2.02-alpha/CDH 4.1), at-rest encryption, the encryption of data stored on persistent storage such as disk, is not. To meet that requirement in the platform, Cloudera and Intel are working with the rest of the Hadoop community under the umbrella of Project Rhino — an effort to bring a comprehensive security framework for data protection to Hadoop, which also now includes Apache Sentry (incubating) — to implement at-rest encryption for HDFS (HDFS-6134 and HADOOP-10150).

Looks like I got this wrong: Apache Sentry will become part of Project Rhino.

Original title and link: Project Rhino goal: at-rest encryption for Apache Hadoop (NoSQL database©myNoSQL)


Hadoop security: unifying Project Rhino and Sentry

One result of Intel’s investment in Cloudera is putting together the teams to work on the same projects:

As the goals of Project Rhino and Sentry to develop more robust authorization mechanisms in Apache Hadoop are in complete alignment, the efforts of the engineers and security experts from both companies have merged, and their work now contributes to both projects. The specific goal is “unified authorization”, which goes beyond setting up authorization policies for multiple Hadoop components in a single administrative tool; it means setting an access policy once (typically tied to a “group” defined in an external user directory) and having it enforced across all of the different tools that this group of people uses to access data in Hadoop – for example access through Hive, Impala, search, as well as access from tools that execute MapReduce, Pig, and beyond.

A great first step.

You know what would be even better? A single security framework for Hadoop instead of two.

Original title and link: Hadoop security: unifying Project Rhino and Sentry (NoSQL database©myNoSQL)


Hortonworks’ Hadoop secret weapon is... Yahoo

Derrick Harris:

Hortonworks was working right alongside Yahoo all through that process. They’ve also worked together on things like rolling upgrades so Hadoop users can upgrade software without taking down a cluster.

  1. who didn’t know about Hortonworks and Yahoo’s collaboration?
  2. what company and product management team would choose not to work with one of the largest user of the technology it is working on?

    This is the perfect example of testing and validating new ideas, learning about the pain your customers are facing in real life. Basically by the book product/market fit.

Original title and link: Hortonworks’ Hadoop secret weapon is… Yahoo (NoSQL database©myNoSQL)


Storing, processing, and computing with graphs

Marko Rodriguez is on the roll with yet another fantastic article about graphs:

To the adept, graph computing is not only a set of technologies, but a way of thinking about the world in terms of graphs and the processes therein in terms of traversals. As data is becoming more accessible, it is easier to build richer models of the environment. What is becoming more difficult is storing that data in a form that can be conveniently and efficiently processed by different computing systems. There are many situations in which graphs are a natural foundation for modeling. When a model is a graph, then the numerous graph computing technologies can be applied to it.

✚ If you missed it, the other recent article I’m referring to is “Knowledge representation and reasoning with graph databases

Original title and link: Storing, processing, and computing with graphs (NoSQL database©myNoSQL)


Consensus-based replication in HBase

Konstantin Boudnik (WANdisco):

The idea behind consensus-based replication is pretty simple: instead of trying to guarantee that all replicas of a node in the system are synced post-factum to an operation, such a system will coordinate the intent of an operation. If a consensus on the feasibility of an operation is reached, it will be applied by each node independently. If consensus is not reached, the operation simply won’t happen. That’s pretty much the whole philosophy.

Not enough details, but doesn’t this sound like Paxos applied earlier?

Original title and link: Consensus-based replication in HBase (NoSQL database©myNoSQL)


Where to look for Hadoop reliability problems

Dan Woods (Forbes) gets a list of 10 possible problems in Hadoop from Raymie Stata (CEO Altiscale) that can be summarized as:

  1. using default configuration options
  2. doing no tuning
  3. understanding Amazon Elastic MapReduce’s behavior

Original title and link: Where to look for Hadoop reliability problems (NoSQL database©myNoSQL)


A Story of graphs, DBs, and graph databases

After Marko Rodriguez’s Knowledge representation and reasoning with graph databases, another great intro to graph databases resource is Joshua Shinavier’s presentation: