NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



NoSQL databases: All content tagged as NoSQL databases in NoSQL databases and polyglot persistence

Spring Data: One API to Rule Them All or Giving NoSQL to Spring Users

If you never looked into Spring Data, Tobias Trelle’s article could give you a brief overview:

Spring Data is a high level SpringSource project whose purpose is to unify and ease the access to different kinds of persistence stores, both relational database systems and NoSQL data stores.

The article is pretty clear why unification for the various relational and NoSQL databases won’t really work. The part left is ease of access, which in itself could still be an achievement. But without the unifying part I think Spring Data is just the Spring way of doing NoSQL for Spring users and not a general solution that non-Spring users would benefit of. I’m aware I’ve said this before.

Original title and link: Spring Data: One API to Rule Them All or Giving NoSQL to Spring Users (NoSQL database©myNoSQL)


I/O Intensive Apps and Amazon Cloud Improvements: EBS Provisioned IOPS & Optimized Instance Types

James Hamilton puts in perspective the last two new I/O related features coming from Amazon: the high performance I/O EC2 instances and EBS provisioned IOPS together with EBS-optimized EC2 instances:

With the announcement today, EC2 customers now have access to two very high performance storage solutions. The first solution is the EC2 High I/O Instance type announced last week which delivers a direct attached, SSD-powered 100k IOIPS for $3.10/hour. In today’s announcement this direct attached storage solution is joined by a high-performance virtual storage solution. This new type of EBS storage allows the creation of striped storage volumes that can reliably delivery 10,000 to 20,000 IOPS across a dedicated virtual storage network.

I’ve already said it, but this confirms it once again that Amazon is addressing most of the complains of running I/O intensive applications on EC2 and EBS.

Original title and link: I/O Intensive Apps and Amazon Cloud Improvements: EBS Provisioned IOPS & Optimized Instance Types (NoSQL database©myNoSQL)


The Time for NoSQL Standards Is Now

Andrew Oliver:

Yet there are obstacles to this transition. First, NoSQL lacks a dominant force. For the RDBMS, no matter which product you choose, you have at least a subset of ANSI standard SQL on which you can depend. For any of the new databases, you may have Pig, Hive, SPARQL, Mongo Query Language, Cypher, or others. These languages have little in common. For the RDBMS, you have some connector standard, at least, in the venerable ODBC. For NewDB, you must rely on a database-specific connector.

What’s needed now is for the NoSQL vendors (10gen, Cloudbase, and so on), interested parties (such as SpringSource, Red Hat, Microsoft, and IBM), and various projects to come together, take some of these separate efforts, and propose standards. First, define the query level. Then define the connector standards.

I’d like to suggest the name of this all-included query language: ingohQL. That stands for it’s-not-going-to-happen-QL.

Original title and link: The Time for NoSQL Standards Is Now (NoSQL database©myNoSQL)


6 Ideal Features for Big Data Transactional Database

Dan Kusnetzky proposes the following 6 features as part of an ideal transactional database:

  1. SQL
  2. ACID transactions
  3. Data and application independence
  4. Elasticity
  5. Multi-tenancy
  6. Geographic distribution

The real question would such a database including all these features be possible? We already know that ACID transactions and support for geographic distribution don’t mix well. SQL was created to work with the relational model, so it’ll be quite limited when considering other data models—think graphs. There are also some (good) arguments why a declarative language as SQL might not be the best fit for large scale databases. Last, but not least, designing a common API to support the different data models is not that realistic either.

Original title and link: 6 Ideal Features for Big Data Transactional Database (NoSQL database©myNoSQL)


Which Is Better for Programmers: SQL vs. NoSQL?

Jeff Cogswell compares some short code samples in an attempt to answer the much bigger question:

But what about the programmers, who write the client code that access the databases? Where do the disagreements leave them? From a programming perspective, is SQL really that horrible and outdated? Or is the new NoSQL really that awful to work with? Perhaps they both have strengths and good points.

I confess that reading the above made me curious about what the article would conclude. Unfortunately, by the time I’ve read the first comparison (JavaScript in NodeJS using SQL vs Mongo) I realized my expectations were too high. For a few reasons:

  1. it would have been impossible to compare the APIs of all relevant NoSQL databases with a relational database;
  2. it would have been very difficult to choose a generic, representative enough use case;
  3. the results would have always been heavily influenced by the quality of drivers and libraries used.

Last but not least, many of the merits of the NoSQL databases are related to operational complexity and not programming complexity. As someone that did a fare amount of coding and close to zero operations, I would probably feel OK accepting a bit of programming complexity for simplified operations. But that might be just a biased opinion.

Original title and link: Which Is Better for Programmers: SQL vs. NoSQL? (NoSQL database©myNoSQL)


Why Database Technology Matters

Damien Katz:

Forget SQL. Forget network, document or object databases. Forget the relational algebra. Forget schemas. Forget joins and normalization. Forget ACID. Forget Map/Reduce.

Think knowledge representation. Think knowledge collection, transformation, aggregation, sharing. Think knowledge discovery.

Think of humanity and its collective mind expanding.

A great read.

Original title and link: Why Database Technology Matters (NoSQL database©myNoSQL)


MySQL Is Done. NoSQL Is Done. It's the Postgres Age

Jeff Dickey enumerates some of the new features available in PostgreSQL—schema-less data, array columns, queuing, full-text searching, geo-spatial indexing—concluding that PosgreSQL has now everything an application needs:

Postgres has taken the features out of all of these tools and integrate it right inside the platform. Now you don’t need to spin up a mongo cluster for non-rel data, rabbitmq cluster for queueing, solr box for searching. You can just have a single postgres server. That saves a huge ops headache since each of those clusters/boxes have to be durable, replicated, and scalable.

Sounds a bit too optimistic? As we’ve learned from the NoSQL space there are no silver bullets:

Now obviously, there’s a glaring downside with this approach: you get one box. Maybe a read slave or something, but really, you can’t scale it.

As you can imagine I disagree with most of the points, the only exception being that it is great to see so many useful features packaged with PostgreSQL—these are definitely going to make like easier for some of the developers.

But when talking about MySQL and NoSQL being done:

  1. MySQL is done, except it has a huge community, there are tons of developers very familiar with it, and last but not least MySQL powers massive deployments. This last part matters a lot.
  2. NoSQL is done, except many NoSQL solutions tackle different problem spaces providing optimal solutions for these by staying focused. Neither Oracle, nor MongoDB, nor PosgreSQL will be able to solve all problems. The wider range of problems they are covering, the less optimal solutions they are providing for corner case or extreme scenarios.

Original title and link: MySQL Is Done. NoSQL Is Done. It’s the Postgres Age (NoSQL database©myNoSQL)


NO DB - the Center of Your Application Is Not the Database

Uncle Bob:

The center of your application is not the database. Nor is it one or more of the frameworks you may be using. The center of your application are the use cases of your application. […] If you get the database involved early, then it will warp your design. It’ll fight to gain control of the center, and once there it will hold onto the center like a scruffy terrier. You have to work hard to keep the database out of the center of your systems. You have to continuously say “No” to the temptation to get the database working early.

Original title and link: NO DB - the Center of Your Application Is Not the Database (NoSQL database©myNoSQL)


NoSQL and Relational Databases Podcast With Mathias Meyer

EngineYard’s Ines Sombra recorded a conversation with Mathias Meyer about NoSQL databases and their evolution towards more friendlier functionality, relational databases and their steps towards non-relational models, and a bit more on what polyglot persistence means.

Mathias Meyer is one of the people I could talk for days about NoSQL and databases in general with different infrastructure toppings and he has some of the most well balanced thoughts when speaking about this exciting space—see this conversation I’ve had with him in the early days of NoSQL. I strongly encourage you to download the mp3 and listen to it.

Original title and link: NoSQL and Relational Databases Podcast With Mathias Meyer (NoSQL database©myNoSQL)

NoSQL Everywhere? Not So Fast

So how can big companies get in on the action? Let’s contrast the nature of data suited for NoSQL with the properties of enterprise data that requires the single-source-of-truth systems that we talked about. We’ll use three V’s: volume, velocity, and variety.

Just in case you want to read an InformationWeek post with no start, no end, and no logic, but (ab)using all the necessary buzzwords.

Original title and link: NoSQL Everywhere? Not So Fast (NoSQL database©myNoSQL)


Cloud Computing Lets Us Rethink How We Use Data

But not everything we do in a database needs guaranteed transactional consistency.

Imagine you are charged with designing a system to collect data on temperature, air flow and electricity use in a building every few minutes from hundreds of locations. The system will be used to make the building more energy efficient. Now imagine you lose a few data points every day.  The cause isn’t important but it could be a glitch with a sensor, a dropped packet, or an incomplete write operation in the database.

Do you care?

It depends from what angle I’m looking at this question. If I’m the producer of the sensor, I do care if it has a glitch. If I’m a network administrator I do care there are dropped packets. And if I am a database system I do care if I’m dropping write operations. And I also have to tell whoever is using me if I am able to receive operations—am I available when I’m needed?

Original title and link: Cloud Computing Lets Us Rethink How We Use Data (NoSQL database©myNoSQL)


My Humble Request to the NoSQL Techies

C. Mohan in his 4th post about the NoSQL space:

So, here is my humble request to the NoSQL techies: For each of your systems, please send me or point me to detailed technical information on each of the important aspects of your system. This should be documentation in the form of papers or presentations, and not pointers to source code comments and such! If some significant aspects of a system aren’t documented reasonably, I am urging the appropriate people to produce such documentation. Of course, for legal reasons, you should NOT send me any confidential or proprietary information.

Here is my offer in return for the above: Once I get hold of such documentation, I am willing to maintain a page for each significant NoSQL system where I will consolidate all the information on that system. Once I get hold of all that information, I will be able to do the comparisons between systems and make suggestions for improvements, etc. for each of the systems. I am planning a tutorial on NoSQL systems and it would be in the best interest of the techies of the different systems to get their systems featured in such a tutorial by providing accurate and complete information on their systems.

In the over 2 and 1/2 years since writing on this NoSQL blog I’ve seen numerous similar attempts. So far the closest to what one would call success are Stefan Edlich’s unstructured but very wide attempt to catalogue NoSQL databases and this blog which is continuously covering various aspects of NoSQL databases. My attempt to create a 5-dimensional characterization of NoSQL databases remains incomplete after 1 and 1/2 years since its debut. But I really hope Mohan will pull this out as everyone would benefit from having better information organized in an accessible public format.

These aside, I think his post brings up a couple of interesting remarks that I’d like to comment on:

  1. The origin of most of the NoSQL databases is not in research labs or academic world, but rather out there in the field. Most of them have been created by people that have run into problems and attempting to solve them led to trying out different approaches.
  2. Most of the NoSQL databases are either open source community driven or backed by small startups. Some of these startups do benefit of funding, but oftentimes that represents a fraction of what other trendy sectors are getting. As an example, Cloudera has raised $76mil in its 3 1/2 years of existence. Compare that with Color’s $40mil.
  3. Most of these systems are created and follow a roadmap rooted in pragmatism and practicality. They are need-based systems. If you’ve worked on an open source project or in a startup you know exactly what I mean. Features are prioritized and implemented based on the current interests of the main stakeholders which is basically the product current users.

These being said, one should note that:

  1. Most of the open source NoSQL database have excellent documentation (at least based on open source projects’ standard). Just take a look at Apache HBase Reference Guide or Redis’s documentation.
  2. There are many books covering NoSQL databases. While I don’t have all of the NoSQL books (or even read cover to cover all those that I have), many of them discuss these solutions in very detail1.
  3. If you’d been following this blog, you’d have noticed that developers involved with NoSQL databases spend a lot of their time documenting them in great detail.

    Let me give you just a couple of examples: Lars George’s rare but heavily technical posts (HBase and Data Locality, Hadoop and HBase: Configuring the Number of Server Side Threads (Xceivers), HBase and Bloom Filters) or Salvatore Sanfilipo’s posts about Redis (Redis Persistence Demystified, Redis Cluster Explained, Redis Guide: What Each Redis Data Type Should Be Used For, Redis diskstore and B-trees).

    Indeed these are not academic papers, but they are definitely providing an in-depth perspective of the nuts and bolts of NoSQL databases. And such materials are not coming only from the people developing NoSQL databases, but also from those running them in production.

    To date, I’ve published almost 3000 posts on this blog and besides my own contributions, a large number of these posts link to articles diving into the details of the various forms of NoSQL solutions.

  4. Even if most of the developers working on NoSQL solutions are busy implementing and running them in production, sometimes they even find the time to publish academic papers and participate at related events.

    I wish I could, but I don’t think I’ve even captured a small fraction of what these guys have published: LinkedIn NoSQL Paper: Serving Large-Scale Batch Computed Data With Project Voldemort, Paper: Apache Hadoop Goes Realtime at Facebook, Riak Bitcask Explained.

  5. Many companies backing NoSQL solutions spend a tremendous amount of time and effort to continuously improve the documentation available. Take a look at DataStax’s documentation for Cassandra, Basho’s documentation for Riak, 10gen’s MongoDB documentation, and I could go on and on for a while.

  6. Last, but not least, check the job boards of these companies: almost each of them is looking for technical writers and evangelists. Obviously that’s because they want to bring more clarity to their products and make things easier for their users.

Bottom line, I think that the NoSQL space is doing quite well in documenting their technical decisions, trade-offs, recommended use cases. I’d actually say that most of the time it’s easier for me to get details about almost any NoSQL database then to figure out some details of a traditional database vendor solution—try to learn how IBM DB2 is implementing compression, or how Teradata is doing hybrid row and column storage. But maybe all this is because I’ve spent so much time in this space.

Anyways, I applaud and wish C. Mohan’s initiative will be successful. And because it is always my intention to help the NoSQL community, I’m ready to offer him both my help and support.

  1. Sometimes I wish I’d get a copy of every NoSQL book published. 

Original title and link: My Humble Request to the NoSQL Techies (NoSQL database©myNoSQL)