ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

nosql theory: All content tagged as nosql theory in NoSQL databases and polyglot persistence

Horizontal Scalability vs Elasticity

Abel Perez in a post about Cassandra:

Horizontal scalability boils down to the ability to add new hardware to a system without any interruption or downtime. An ideal horizontally scalable system does not require reconfiguration and supports incremental addition of hardware.

Nope. This is the definition of elasticity.

Horizontal scalability is the capability of a system to accept adding or removing multiple nodes (independent units of resources) and making them work as a single system. The scalability of a system can be further categorized as: negative, sub-linear, linear, or supra-linear depending on the shape of the performace1/nodes curve


  1. This is the part where things can get more complicated as there are multiple ways to characterize the performance of a system (e.g. throughput, latency, etc.) 

Original title and link: Horizontal Scalability vs Elasticity (NoSQL database©myNoSQL)


A 771 Words Description of Map Reduce

Can a skyscraper completed in 1931 be used to explain a parallel processing algorithm introduced in 2004? In this post, I use the anology of counting smartphones in the Empire State Building to explain MapReduce…without using code.

Andrew Brust’s metaphor is nice, but I wonder if these days there’s a single person coming even close to data that needs a 771 words description of how Map Reduce works.

Original title and link: A 771 Words Description of Map Reduce (NoSQL database©myNoSQL)

via: http://www.zdnet.com/blog/big-data/the-mapreduce-101-story-in-102-stories/190


6 Reasons Why We Need NoSQL

  1. We’are dealing with much more data.
  2. We require sub-second responses to queries
  3. We want applications to be up 24/7
  4. We’re seeing many applications in which the database has to soak up data as fast (or even much faster) than it processes queries
  5. We’re frequently dealing with changing data or with unstructured data
  6. We’re willing to sacrifice our sacred cows.

Not bad. But it reads more like the definition of Big Data.

Original title and link: 6 Reasons Why We Need NoSQL (NoSQL database©myNoSQL)

via: http://bigdatadiary.com/6-reasons-why-we-need-nosql/


With Concatenative Programming, a Parallel Compiler Is a Plain Old Map-Reduce

I’m still digesting Jon Purdy’s post:

A compiler for a statically typed concatenative language could literally:

  1. Divide the program into arbitrary segments
  2. Compile every segment in parallel
  3. Compose all the segments at the end

This is impossible to do with any other type of language. With concatenative programming, a parallel compiler is a plain old map-reduce!

Original title and link: With Concatenative Programming, a Parallel Compiler Is a Plain Old Map-Reduce (NoSQL database©myNoSQL)

via: http://evincarofautumn.blogspot.com.au/2012/02/why-concatenative-programming-matters.html


Threat to NoSQL Database?

This question was posted on LinkedIn:

During my research I came across a new database technology called ‘NuoDB’. It seems to share many attributes of NoSQL databases and still maintain a SQL query interface. It uses a key value store as its data storage engine. It also promises the performance and scale of NoSQL databases.

Enterprises that have not yet embraced NoSQL, will be inclined to try this option before going NoSQL way in my opinion. Mainly because it does not require them to change their database interface layer drastically and also because NoSQL databases have not moved towards a standards based query interface yet.

For an outsider this comment might look extremely valid. I mean who in his right mind would give away all the expertise and tools and history of SQL for something like NoSQL?

But the real answer is in the details. “It seems to share many attributes of NoSQL databases” . Ask yourself what are these shared attributes:

  1. what is the supported data model? The relational model advantages have been discussed over and over for the last 30 years. But there are alternative data models that bring different
  2. what is the persistence model? Is it disk based, memory based, cluster based? IS it durable?
  3. what is the distribution model? Is it master-slave or master-master or peer-to-peer or masterless?
  4. what are the scalability characteristics of the system?
  5. what are the elasticity characteristics of the system?

In the only comment worth reading, Stefan Edlich correctly points to the tons of NewSQL solutions. Before asking if these systems “pose a threat” to NoSQL databases, I’d firstly ask if they are at least a threat to the existing relational databases first. And the answer is no.

Sid Anand wrote in the State of NoSQL 2012 post:

Many of the NoSQL vendors view the “battle of NoSQL” to be akin to the RDBMS battle of the 80s, a winner-take-all battle. In the NoSQL world, it is by no means a winner-take-all battle. Distributed Systems are about compromises.

I’d go even further and say that data storage is not anymore a winner-takes-all battle. Actually it’s not even a zero-sum game. We are living the polyglot persistence age.

Original title and link: Threat to NoSQL Database? (NoSQL database©myNoSQL)


Taking a Step Back From ORMs and a Parallel to the Database World

Jeff Davis:

So, my proposal is this: take a step back from ORMs, and consider working more closely with SQL and a good database driver. Try to work with the database, and find out what it has to offer; don’t use layers of indirection to avoid knowing about the database. See what you like and don’t like about the process after an honest assessment, and whether ORMs are a real improvement or a distracting complication.

I know a lot of applications using ORMs that worked perfectly fine. And I know applications that had to go around the ORMs or even got rid completely of them.

Here is a parallel to think about: ORM vs SQL is similar to always using a relational database versus using the storage solution that better fits the problem—as in using a NoSQL database or going polyglot persistence. An ORM comes with the advantage of keeping you inside a single paradigm (object oriented) at the cost of not being able to (easily) use the full power of the underlying storage.

Original title and link: Taking a Step Back From ORMs and a Parallel to the Database World (NoSQL database©myNoSQL)

via: http://thoughts.davisjeff.com/2012/02/26/taking-a-step-back-from-orms/


5 Requirements for Enterprise NoSQL databases

Emil Eifrem enumerates 5 requirements for adopting NoSQL databases in the enterprise environment:

  1. Ability to Handle Today’s Complex and Connected Data
  2. Simplify the Development of Applications Using Complex and Connected Data
  3. Support for End-to-End Transactions
  4. Enterprise-grade Durability so that Data is Never Lost
  5. Java Still Reigns for Enterprise Development

I think Emil Eifrem has left out a couple of other critical aspects, but I agree with 4 and 1/2 of those on his list.

Original title and link: 5 Requirements for Enterprise NoSQL databases (NoSQL database©myNoSQL)

via: http://www.dbta.com/Articles/Editorial/Trends-and-Applications/NoSQL-for-the-Enterprise-80198.aspx


Column vs Row Stores: How do they compare?

Yesterday I’ve asked on Twitter about technical papers looking at column-stores vs row-stores. Most of the answers I’ve got are pointing to the research done by Daniel Abadi: Papers and Technical Reports. I’ll start with:


Why NoSQL Databases Are Not Just For Google and Amazon?

Oren Eini1:

Why the history lesson, you ask? Why, to give you some perspective on the design choices that led to the victory of the relational databases. Space was at a premium, the interaction between the user and the application closely modeled the physical layout of the data in the database. That made sense, because there really were no other alternatives given the environment that existed at the time.

In my company, we are using RavenDB as the backend database for everything from a blog, our ordering and purchasing systems, the daily build server and many more. The major advantages that we found weren’t the ability to scale (although that exists), it is the freedom that it gives us in terms of modeling our data and changing our minds.

The Googles, Facebooks, and Amazons told the story of this was not our relational database vendors’ fault. Jan Lehnardt2 said a while back that NoSQL is about choice. I said that NoSQL databases are a departure from having just good enough solutions. And Oren Eini is emphasizing the benefits of other data models.


  1. Oren Eini is the creator and main developer of the RavenDB document database. 

  2. Jan Lehnardt: Apache CouchDB committer, Couchbase engineer 

Original title and link: Why NoSQL Databases Are Not Just For Google and Amazon? (NoSQL database©myNoSQL)

via: http://java.dzone.com/articles/why-nosql-not-just-google-and


5 Key Elements for a Firehose Data System

The 5 key elements for a firehose data system as per a presentation by Josh Berkus, CEO of PostgreSQL Experts Inc. summarized by Brian Proffitt on ITworld:

  1. Queuing software to manage out-of-sequence data
  2. Buffering techniques to deal with component outages
  3. Materialized views that update data into aggregate tables
  4. Configuration management for all the systems in the solution
  5. Comprehensive monitoring to look for failures

Basically firehose data systems are the perfect showcase of the 4 V’s in Big Data. To get an idea of the complexity involved by such systems check the DataSift architecture which relies on MySQL, HBase, Memcached, Redis, Kafka to deal just1 with the Twitter firehose.

Original title and link: 5 Key Elements for a Firehose Data System (NoSQL database©myNoSQL)


Hybrid Word Aligned Bitmaps: Why are column oriented databases so much faster than row oriented databases? -

Terence Siganakis:

I have been playing around with Hybrid Word Aligned Bitmaps for a few weeks now, and they turn out to be a rather remarkable data structure.  I believe that they are utilized extensively in modern column oriented databases such as Vertica and MonetDB. Essentially HWABs are a data structure that allows you to represent a sparse bitmap (series of 0’s and 1’s) really efficiently in memory.  The key trick here is the use of run length encoding to compress the bitmap into fewer bits while still allowing for lightening fast operations.  

The comment thread discusses a couple of reasons for column databases being faster than row-oriented databases and some scenarios where this is not happening.

Terence Siganakis links to FastBit: An efficient compressed Bitmap index technology :

FastBit is an open-source data processing library following the spirit of NoSQL movement. It offers a set of searching functions supported by compressed bitmap indexes. It treats user data in the column-oriented manner similar to well-known database management systems such as Sybase IQ, MonetDB, and Vertica.

Original title and link: Hybrid Word Aligned Bitmaps: Why are column oriented databases so much faster than row oriented databases? - (NoSQL database©myNoSQL)

via: http://siganakis.com/using-bitmap-indexes-in-query-processing


The History of NoSQL: This Was Not Our Technology Vendors’ Fault

Werner Vogels in the post about Amazon DynamoDB:

We had been pushing the scalability of commercially available technologies to their limits and finally reached a point where these third party technologies could no longer be used without significant risk. This was not our technology vendors’ fault; Amazon’s scaling needs were beyond the specs for their technologies and we were using them in ways that most of their customers were not. A number of outages at the height of the 2004 holiday shopping season can be traced back to scaling commercial technologies beyond their boundaries.

Here is what I wrote about the history behind NoSQL databases:

Providing decent solutions, up to a point, to a wide range of problems and covering more scenarios than alternative storage solutions existing at that time, made relational databases the de facto storage for the last 30 years. But during the last years, more and more problems crossed the boundaries of what could have been considered decent solutions leading to the need for specialized, better than good enough alternative solutions. And thus NoSQL databases.

It feels rewarding to get such confirmation from people that are at the forefront of NoSQL.

Original title and link: The History of NoSQL: This Was Not Our Technology Vendors’ Fault (NoSQL database©myNoSQL)