NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



rdbms: All content tagged as rdbms in NoSQL databases and polyglot persistence

2014 State Of Database Tech: Think Retro

Joe Masters Emison for InformationWeek:

Today’s database landscape isn’t just static. It’s positively retro. Remember 2004? Facebook had just launched, the iPad wasn’t even a twinkle in Steve Jobs’ eye, and Gartner’s database market share report put IBM (34.1%), Oracle (33.7%), and Microsoft (20%) in the top spots. In our survey, Microsoft, Oracle, and IBM still hold the top spots; we do add MySQL, but that’s about it for innovation. […]

And those relational databases from Microsoft, Oracle, and IBM? They’re essentially just updated versions of the companies’ 2004 offerings.

You’ll see these numbers in many surveys. But there are a couple of things to keep in mind while reading them:

  1. the enterprise world is well-known to be a late adopter. A very late adopter actually.
  2. many of these databases are subscription based so customers are locked-in on at least an yearly basis
  3. many of these databases have been acquired together with hardware and consultancy/support. Another type of lock-in.
  4. none of these databases is showing the growth in demand, jobs, and revenue that the top NoSQL databases are seeing for the last 12-18 months.

When you already bought a house, it’s quite difficult to go out looking for a new one. But there’s no good reason for you not to look and get the best appliances and furniture for your house.

Original title and link: 2014 State Of Database Tech: Think Retro (NoSQL database©myNoSQL)

Do all roads lead back to SQL? Some might and some might not

Seth Proctor for Dr.Dobb’s:

Increasingly, NewSQL systems are showing scale, schema flexibility, and ease of use. Interestingly, many NoSQL and analytic systems are now putting limited transactional support or richer query languages into their roadmaps in a move to fill in the gaps around ACID and declarative programming. What that means for the evolution of these systems is yet to be seen, but clearly, the appeal of Codd’s model is as strong as ever 43 years later.

Spend a bit of time reading (really reading) the above paragraph—there are quite a few different concepts put together to make the point of the article.

SQL is indeed getting closer to the NoSQL databases, but mostly to Hadoop. I still stand by my thoughts in The premature return to SQL.

Most NoSQL databases already offer some limited ACID guarantees. And some flavors of transactions are supported or are being added. But only as long as the core principles can still be guaranteed or the trade-offs are made obvious and offered as clear choices to application developers.

The relational model stays with the relational databases. If some of its principles can be applied (e.g. data type integrity, optional schema enforcement), I see nothing wrong with supporting them. Good technical solutions know both what is needed and what is possible.

Original title and link: Do All Roads Lead Back to SQL? | Dr Dobb’s (NoSQL database©myNoSQL)


Mapping relational databases terms and SQL to MongoDB

A tuts+ guide to MongoDB for people familiar with SQL and relational databases:

We will start with mapping the basic relational concepts like table, row, column, etc and move to discuss indexing and joins. We will then look over the SQL queries and discuss their corresponding MongoDB database queries.

By the end of it you’ll probably not be able to convert your app to MongoDB, but at the next meetup or hackaton you’ll have an idea of what those Mongo guys are talking about.

Original title and link: Mapping relational databases terms and SQL to MongoDB (NoSQL database©myNoSQL)


NoSQL and RDBMS - The right and left brain of data

One of the best explanation for polyglot database architectures:

Comparing NoSQL and relational databases is lot like comparing the left and right sides of the brain. Too much focus on structural differences and attributes can overshadow the fact that we’re stuck with both sides of the brain and we need both to make the best use of sensory data.

Original title and link: NoSQL and RDBMS - The right and left brain of data (NoSQL database©myNoSQL)


InfiniSQL - How to make an infinitely scalable relational database hosted a guest post from the author of InfiniSQL:

Benchmarking shows that an InfiniSQL cluster can handle over 500,000 complex transactions per second with over 100,000 simultaneous connections, all on twelve small servers. The methods used to test are documented, and the code is all available so that any practitioner can achieve similar results. There are two main characteristics which make InfiniSQL extraordinary:

  1. It performs transactions with records on multiple nodes better than any clustered/distributed RDBMS
  2. It is free, open source. Not just a teaser “community” version with the good stuff proprietary. The community version of InfiniSQL will also be the enterprise version, when it is ready.

Tell me how fast can you find in the documentation of InfiniSQL that it is memory-only1.

  1. There is nothing wrong with being memory-only. What’s wrong is talking about the speed without mentioning anything about the storage until chapter 3

Original title and link: InfiniSQL - How to make an infinitely scalable relational database (NoSQL database©myNoSQL)


Relational to Riak

A 3-part, a bit too high level for me, article about what is to be gained (and lost) when using Riak instead of a relational database:

  1. High Availability
  2. Cost of Scale
  3. Tradeoffs

What I always like about Basho’s posts is that they don’t shy away from covering the tradeoffs.

Original title and link: Relational to Riak (NoSQL database©myNoSQL)

Why NoSQL Can Be Safer than an RDBMS

Robin Schumacher1:

That said, I disagree with many of the article’s statements, the most important being that companies should not consider NoSQL databases as a first choice for critical data. In this article, I’ll show first how a NoSQL database like Cassandra is indeed being used today as a primary datastore for key data and, second, that Cassandra can actually end up being safer than an RDBMS for important information.

You already know how this goes: “First they ignore you, then they laugh at you, then they fight you, then you win”. I’ll let you decide where major NoSQL databases are today.

  1. Robin Schumacher is VP of Products at DataStax. He’s also my boss

Original title and link: Why NoSQL Can Be Safer than an RDBMS (NoSQL database©myNoSQL)


Traditional, NoSQL and NewSQL Are All Broken. All Data in Memory

Stancey Schneider for VMware:

Over the past few years, memory has gotten cheap and is easily commoditized in the cloud. So moving your data strategy to put it all in-memory just plain makes sense. It eliminates an extra hop to read and write data from disk, making it inherently faster and the performance more consistent. It also manages to simplify the internal optimization algorithms and reduce the number of instructions to the CPU making better use of the hardware.

This is the “conclusion” after “establishing” in the post that:

  1. traditional databases are already broken because of the fixed schemas and data being persisted on disk
  2. NoSQL databases are also broken because even if they have flexible schemas, data is still persisted on disk and “replication takes time to do all the read and writes”
  3. NewSQL are also broken because “the way the databases handles the data distribution makes it so there NewSQL databases do not scale linearly”

All this FUD just to promote GemFire and SQLFire? I really thought VMware is a serious company.

Original title and link: Traditional, NoSQL and NewSQL Are All Broken. All Data in Memory (NoSQL database©myNoSQL)


Why I Love and Hate NoSQL and RDBMS Databases

The most sincere, simple, and correct list of pros and cons about NoSQL databases and RDBMS. Hat tip to Kelly Martinez.

Original title and link: Why I Love and Hate NoSQL and RDBMS Databases (NoSQL database©myNoSQL)


Monty Widenius About NoSQL, Big Data, and Obvioulsy MySQL and MariaDB

The interview Dmitry Sotnikov1 had with Monty Widenius was published on so many places that I had a hard time deciding which to link to. Anyways, there are a couple of comments and corrections that I’d like to suggest:

The whole thing with the “new NoSQL movement” started with a blog post from a Twitter employee that said MySQL was not good enough and they needed “something better,” like Cassandra.

That’s not quite correct. The “NoSQL movement” debuted in 2009 when the guys from organized an event about “open source, distributed, non relational databases” where they invited people from companies like Cloudera, LinkedIn, StumbleUpon, etc. to talk about the solutions they were building to responde to their platforms’ special requirements. But as papers like Bigtable: A distributed storage system for structured data and Dynamo: Amazon’s Highly Available Key-value Store prove, NoSQL solutions have been in production way before 2009.

I can’t find the original article, but I did find a follow up a bit later where it was said MySQL would be dropped for Cassandra.

I can help find that article as it was posted on this blog: Cassandra @ Twitter: An Interview with Ryan King

The main reason Twitter had problems with MySQL back then, was that they were using it incorrectly.

I don’t think there are many examples in the history of software where a private platform benefited from more scaling advice than Twitter. Judging by how many solutions have been suggested, a possible Twitter IPO will be at risk of IP law suites.

The current state is that now, three years later, Twitter is still using MySQL as their main storage for tweets. Cassandra was, in the end, not able to replace MySQL.

That’s true. What’s also true is that at that time Cassandra was at version 0.9 and and that having to invest into a new databases was considered riskier than investing into more hardware and hiring MySQL experts.

The main reason NoSQL became popular is that, in contrast to SQL, you can start using it without having to design anything. This makes it easier to start with NoSQL, but you pay for this later when you find that you don’t have control of your data (if you are not very careful).

I assume that this is how a vendor would present flexible data models as a drawback. It is also one of the most dangerous misconceptions about NoSQL, i.e. NoSQL databases require no data modeling. The reality is that most of the time using a NoSQL database will require a lot more thinking and analysis of the data models and data access patterns. There are no blueprints, no normalized forms, and no ORMs to hide everything away.

As soon as data can’t fit into memory, SQL generally outperforms NoSQL.

Where’s the proof? According to the data I have, there’s no comparison between let’s say Cassandra and MySQL.

For anything else, you have to write a program and it’s very hard to beat a SQL optimizer for complex things, especially things that are automatically generated based on user requests (required for most web sites).

That’s true. Except when:

  1. most of the people don’t know how to write those SQL queries—search StackOverflow for a random sample of what I mean
  2. getting everything out of your database requires using vendor specific solutions
  3. there’re those moments when the optimizer decides to change the execution plan in such a way that brings down your whole service

The problem with Hadoop is that there is no known business model around it that ensures that the investors will get back 10X money that they expect. Because of that, I have a hard time understanding how Cloudera can survive in the long run.


Everything else in the interview is spot on.

  1. Dmitry Sotnikov: COO at Jelastic 

Original title and link: Monty Widenius About NoSQL, Big Data, and Obvioulsy MySQL and MariaDB (NoSQL database©myNoSQL)


NoSQL Everywhere? Not So Fast

So how can big companies get in on the action? Let’s contrast the nature of data suited for NoSQL with the properties of enterprise data that requires the single-source-of-truth systems that we talked about. We’ll use three V’s: volume, velocity, and variety.

Just in case you want to read an InformationWeek post with no start, no end, and no logic, but (ab)using all the necessary buzzwords.

Original title and link: NoSQL Everywhere? Not So Fast (NoSQL database©myNoSQL)


Microsoft SQL Server 2012 High Availability Solutions

The recent announcement of the Microsoft SQL Server 2012 release emphasized the high availability features added to this version. Here is what I could find after some digging through the documentation:

  • AlwaysOn Failover Cluster Instances: As part of the SQL Server AlwaysOn offering, AlwaysOn Failover Cluster Instances leverages Windows Server Failover Clustering (WSFC) functionality to provide local high availability through redundancy at the server-instance level—a failover cluster instance (FCI). An FCI is a single instance of SQL Server that is installed across Windows Server Failover Clustering (WSFC) nodes and, possibly, across multiple subnets. On the network, an FCI appears to be an instance of SQL Server running on a single computer, but the FCI provides failover from one WSFC node to another if the current node becomes unavailable.

    This is explained in more detail on AlwaysOn Failover Cluster Instances (SQL Server).

  • AlwaysOn Availability Groups: The AlwaysOn Availability Groups feature is a high-availability and disaster-recovery solution that provides an enterprise-level alternative to database mirroring. Introduced in SQL Server 2012, AlwaysOn Availability Groups maximizes the availability of a set of user databases for an enterprise. An availability group supports a failover environment for a discrete set of user databases, known as availability databases, that fail over together. An availability group supports a set of read-write primary databases and one to four sets of corresponding secondary databases. Optionally, secondary databases can be made available for read-only access and/or some backup operations.

    More documentation about AlwaysOn Availability groups can be found here.

  • Database mirroring: This feature will be removed in a future version of Microsoft SQL Server.

  • Log shipping: SQL Server Log shipping allows you to automatically send transaction log backups from a primary database on a primary server instance to one or more secondary databases on separate secondary server instances.

    This is the well-known master-slave setup. More details can be found here.

Also worth checking the availability of these feature per SQL Server 2012 editions:

SQL Server 2012 Hgih Availability

Original title and link: Microsoft SQL Server 2012 High Availability Solutions (NoSQL database©myNoSQL)