NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



nosql: All content tagged as nosql in NoSQL databases and polyglot persistence

When to Use a NoSQL Database

Gil Allouche:

If your company has complicated, large sets of data that it’s looking to analyze, and that data isn’t simple, structured or predictable data then SQL is not going to meet your needs. While SQL specializes in many things, large amounts of unstructured data is not one of those areas. There are other methods for gathering and analyzing your data that will be much more effective and efficient and probably cost you less too.

It fascinates me how our industry is still looking for generic blueprints for making technical decisions. Based on your own experience how many times did this work? How many times have you been able to make a decision (leading to a successful project) based on a checklist? I can understand that checklists are useful in reducing the initial search area, but the rest should always be based on a combination of experience, learning and understanding, and try-and-error. It doesn’t sound scientific, but I’d argue it’s more scientific than a generic checklist.

Original title and link: When to Use a NoSQL Database (NoSQL database©myNoSQL)


Blame it on the database

The story of a famous failure:

Another sore point was the Medicare agency’s decision to use database software, from a company called MarkLogic, that managed the data differently from systems by companies like IBM, Microsoft and Oracle. CGI officials argued that it would slow work because it was too unfamiliar. Government officials disagreed, and its configuration remains a serious problem.


“We have not identified any inefficient and defective code,” a CGI executive responded in an email to federal project managers, pointing again to database technology that the Medicare agency had ordered it to use as the culprit, at least in part.

I’m not going to defend Marklogic. But this sounds so much as the archetype of a failure story:

  1. start by blaming the other contractors
  2. find the newest or less known technology used in the project
  3. point all fingers to it

Long time ago I’ve been in a similar project. Different country, different agencies, different contractors, but exactly the same story. It was in the early days of my career. But what I’ve learned at that time stuck with me and even if today it may sound like a truism, it’s still one of the big lessons: It’s not the technology. It’s the people. Always. And the money.

Original title and link: Blame it on the database (NoSQL database©myNoSQL)


Migrating databases with zero downtime

That’s how you do it!

One of the most detailed descriptions of migrating data while keeping your service availability:

The solution we came up with was to split the migration into two parts: writing, then reading.

For each component that we were migrating, we would come up with a data schema that made sense for that part of the system. We would then make a branch off of master, the ‘writes’ branch. The writes branch was responsible for 2 things. First, it would mirror all writes to Mongo/Titan into it’s eqivalent Cassandra table. […] Next, it would have a migration script that would copy all of our historical data for that component into Cassandra. So once the writes branch was deployed, and the migration script was run, all of our data was in both Mongo/Titan and Cassandra, and anything that was created or updated was also written to both places.

Next, we would make a branch off of our writes branch, this was our ‘reads’ branch. The reads branch switches all reads from Mongo/Titan to our new Cassandra table(s), removed all references to Mongo/Titan for the migrated component, and stops all writes for them. In practice, this is the most complex branch to write because of minor variations in the way things come back from the different databases.

There’s also a “keep-in-mind” list. To which I’d add:

  1. if your application doesn’t use some sort of data access layer, you’ll have a hard time completing this migration. It won’t be because you cannot identify the data access points, but because each of these would have their own expectations and way of dealing with exceptional cases;
  2. the more different the data models of your source and target databases are, the more difficult the migration will be; if it’s possible once you have the write path covered enabled the read paths one by one;
  3. do NOT disable the double write path for a while; there might be subtle but serious bugs that you haven’t discovered or performance issues that you haven’t or couldn’t predict. There also might be external processes/mini-apps that are rarely used and that you’ve totally forgotten about.

Original title and link: Migrating Databases With Zero Downtime (NoSQL database©myNoSQL)


Why NoSQL Databases Are Gaining Fans

Doug Henschen for InformationWeek:

Did you hear about MongoDB’s $150 million venture capital haul announced last week? How about DataStax’s $45 million round in July or Couchbase’s $25 million infusion in August?

What all these young vendors have in common is that they’re the backers of open-source NoSQL databases. As we explain in this week’s digital issue cover story, “When NoSQL Makes Sense,” these databases have it all over relational databases when it comes to scalability and flexibility. What’s more, they promise faster, cheaper development than enterprise stalwarts IBM DB2, Microsoft SQL Server and Oracle Database.

I emphasized the key attributes Doug Henshen is mentioning. I’ve run through a list of other attributes that NoSQL databases are using to describe themselves (e.g. fast (as in performance), highly available, etc.), but I couldn’t find others that describe all the NoSQL products.

Original title and link: Why NoSQL Databases Are Gaining Fans (NoSQL database©myNoSQL)


The premature return to SQL

Here’s what Jack Clark’s wrote recently for The Register about what is now an obvious trend across the NoSQL databases market:

The tech world is turning back toward SQL, bringing to a close a possibly misspent half-decade in which startups courted developers with promises of infinite scalability and the finest imitation-Google tools available, and companies found themselves exposed to unstable data and poor guarantees.

This pisses me off. A lot.

It’s not because I hate SQL as a language. Even if it’s full of quirks, limitations, and flavors.

It’s also not because so many are still confusing SQL with relational databases.

It’s not even because I’m some sort of masochist technology fanboy that likes seeing others be unproductive.

There are two simple reasons that make me feel angry about this premature return and oversized investment into SQL.

First is that the innovation in the area of data processing is stopping too early in an attempt to capture financial returns. I’m not an absurd guy that doesn’t understand that businesses cannot survive without money. Nor research and innovation can happen without businesses and implicitly money.

But some sort of naivety makes me believe that this turn away from experimentation is happening too early. The data space is big enough to allow the new guys to continue to research, experiment, try, fail and succeed. It’s probably not big enough to show hockey stick growth though.

Just take a second a think what we got during this misspent half-decade: Redis, Cassandra, Riak, a multi-parallel fully programmatic way to process data, Cascading, Pig, Cypher, ReQL and many more tools, languages, and APIs for processing data.

Many of them haven’t reached maturity and thus might not feel as friendly or productive as SQL. Many of them haven’t yet had the chance to show everything they’ve imagined. But they’ve already opened new doors into data. They made us think again about the value of data, they gave us back the excitement and rewarding feeling of digging deeper into data.

The other reason I’m sadden by this trend is that it is happening now mostly due to the peer pressure coming from the largest database vendors. The costs they imposed over time on users has a secondary, not immediately obvious implication. In order to protect their investments, users are now going to these big vendors asking about the new shiny technologies. These new technologies that some have already recognized that cannot be ignored. What they usually get is either a raise of shoulders or a pragmatic answer: “write us a bigger check and we’ll make it happen”. Enraged, they go to the young companies demanding SQL and threatening to not use and pay (the little) price to support these new tools. This whole trend is due to misdirected peer pressure.

I want to leave you with the following thought: what would have happened if the Wright brothers or people like Howard Hughes would have just gone back to steamboats? Would we ask for SQL today?

Original title and link: The premature return to SQL (NoSQL database©myNoSQL)

How do you decide what database to use for what task?

Nathan Milford of Outbrain answering the question how do you decide what database to use for what task:

We look at how the data will be queried, its size, and how it needs to be distributed. We might use things like MySQL for historical reasons and MongoDB for smaller tasks, and then Cassandra for situations where data doesn’t all fit into memory or where it spans multiple machines and possibly data centers.

This is indeed the good recipe: data access model, data size, distribution model.

Original title and link: How do you decide what database to use for what task? (NoSQL database©myNoSQL)


Best NoSQL April’s Fool

I know a few people that avoid the Internet completely on April’s Fool. After being tricked every year by my dad, I’m very careful with what I’m posting on that day. This year has been easy on me, but that doesn’t mean there weren’t a couple of good ones.

My favorites:

Original title and link: Best NoSQL April’s Fool (NoSQL database©myNoSQL)

Cage Match: MySQL vs NoSQL vs Postgres

A post by Brain Aker about the state of MySQL, Postgres and NoSQL databases.

I had a couple of comments and these evolved into a long rant.

MySQL became less interesting once it was acquired […]

I’ve never been very sure what metric is used to measure how interesting a product is. That in case there’s such a metric. As opposed to some suggestions I’m reading, I haven’t seen stories of people moving away from MySQL because Oracle acquired it. Except Fedora and OpenSUSE replacing MySQL with MariaDB and this due to very specific issues (no security infos, no access to regression tests).

the number of Postgres deployments is greater then what all of the NoSQL market combined adds up to

Comparing 15 years of PosgreSQL with 3 years of NoSQL isn’t going to give meaningful results (for a similar unbalanced comparisons try Oracle vs PostgreSQL). I’m not aware of any database that captured a significant market share in the first 3 years of its existance. Except MySQL. Not Postgres.

Would a document model really matter if schemas could be altered online?

Yes, it would definitely remain relevant. Schema flexibility is not only about updating it, but also about the types allowed. PostgreSQL has indeed added support for arrays and JSON. I see this as a confirmation of what’s happening in the NoSQL space and also about the future of storage engines.

no new language has emerged from the NoSQL market that has any size-able adoption

MongoDB’s query language and the aggregation framework are used by a lot of people. It’s probably not the ideal query language and it comes in two different flavors, but it’s there and it’ll most probably evolve. Biasedly, I could also point to RethinkDB’s data manipulation language for an example of something that is probably on par with SQL and without the hidden unknown corner cases of SQL. Indeed none of these can come close the the adoption acquired by SQL in its 30 years of existance.

Bottom line is that I expect bridges to be built between relational databases and NoSQL databases and each side adopting those features that are useful to their users. I also expect that slowly this relational databases are crap vs NoSQL databases are crap debate will go away, people realizing that the data space is not a zero sum game. Vendors will be the last to give up this fight, but customers have a lot of power in making this happen.

Original title and link: Cage Match: MySQL vs NoSQL vs Postgres (NoSQL database©myNoSQL)


Traditional, NoSQL and NewSQL Are All Broken. All Data in Memory

Stancey Schneider for VMware:

Over the past few years, memory has gotten cheap and is easily commoditized in the cloud. So moving your data strategy to put it all in-memory just plain makes sense. It eliminates an extra hop to read and write data from disk, making it inherently faster and the performance more consistent. It also manages to simplify the internal optimization algorithms and reduce the number of instructions to the CPU making better use of the hardware.

This is the “conclusion” after “establishing” in the post that:

  1. traditional databases are already broken because of the fixed schemas and data being persisted on disk
  2. NoSQL databases are also broken because even if they have flexible schemas, data is still persisted on disk and “replication takes time to do all the read and writes”
  3. NewSQL are also broken because “the way the databases handles the data distribution makes it so there NewSQL databases do not scale linearly”

All this FUD just to promote GemFire and SQLFire? I really thought VMware is a serious company.

Original title and link: Traditional, NoSQL and NewSQL Are All Broken. All Data in Memory (NoSQL database©myNoSQL)


One Database to Rule Them All?

Curt Monash took upon himself the task of writing about why a data store independent of consistency models, upfront data modeling and access algorithms is almost impossible:

To date, nobody has ever discovered a data layout that is efficient for all usage patterns.

He’s reached a similar conclusion to what I wrote in my link post. Here’s mine:

[…] a database feature an ubiquitous interface independent of consistency models, upfront data modeling, and access algorithms is never going to be efficient. Actually, I’m not even sure it would make sense being built

Here’s Curt Monash’s:

So what would happen if somebody tried to bundle all conceivable functionality into a single DBMS, with a plan to optimize the layout of any particular part of the database as appropriate? I think the outcome would be tears — for the development effort would be huge, while the benefits would be scanty. The most optimistic cost estimates could run in the 100s of millions of dollars, with more realistic ones adding a further order of magnitude. But no matter what the investment, the architects would be on the horns of nasty dilemma

Definitely more impactful.

Original title and link: One Database to Rule Them All? (NoSQL database©myNoSQL)


A Data Store Independent of Consistency Models, Upfront Data Modeling and Access Algorithms

Tina Groves1 in “Where Does Hadoop Fit in a Business Intelligence Data Strategy?“:

For example, the decision to move and transform operational data to an operational data store (ODS), to an enterprise data warehouses (EDW) or to some variation of OLAP is often made to improve performance or enhance broad consumability by business people, particularly for interactive analysis. Business rules are needed to interpret data and to enable BI capabilities such as drill up/drill down. The more business rules built into the data stores, the less modelling effort needed between the curated data and the BI deliverable.

That’s why Chirag Mehta’s ideal database featuring “an ubiquitous interface independent of consistency models, upfront data modeling, and access algorithms” is never going to be efficient. Actually, I’m not even sure it would make sense being built.

  1. Tina Groves: Product Strategist, IBM Business Intelligence 

Original title and link: A Data Store Independent of Consistency Models, Upfront Data Modeling and Access Algorithms (NoSQL database©myNoSQL)


NoSQL or NewSQL: The Ideal Database

Talking about ideal database solutions, Chirag Mehta writes in “A Journey From SQL to NoSQL to NewSQL“:

“Design a data store that has ubiquitous interface for the application developers and is independent of consistency models, upfront data modeling (schema), and access algorithms. As a developer you start storing, accessing, and manipulating the information treating everything underneath as a service. As a data store provider you would gather upstream application and content metadata to configure, optimize, and localize your data store to provide ubiquitous experience to the developers. As an ecosystem partner you would plug-in your hot-swappable modules into the data stores that are designed to meet the specific data access and optimization needs of the applications.”

Original title and link: NoSQL or NewSQL: The Ideal Database (NoSQL database©myNoSQL)