NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



RDBMS: All content tagged as RDBMS in NoSQL databases and polyglot persistence

NoSQL Databases Best Practices and Emerging Trends

Jans Aasman (CEO AllegroGraph) interviewed by Srini Penchikala:

InfoQ: What best practices and architecture patterns should the developers and architects consider when using a solution like this one in their software applications?

Jans: If your application requires simple straight joins and your schema hardly changes then any RDBM will do.

If your application is mostly document based, where a document can be looked at as a pre-joined nested tree (think a Facebook page, think a nested JSON object) and where you don’t want to be limited by an RDB schema then key-value stores and document stores like MongoDB are a good alternative.

If you want what is described in the previous paragraph but you have to perform complex joins or apply graph algorithms then the MongoGraph approach might be a viable solution.

Thinking about the products and projects I’ve been working on, most of them have had to deal with all these aspects in different areas of the applications and with different importance to the final solution. Mistakenly though, in most of the cases they ended up using a relational database only. With polyglot persistence here, this shouldn’t happen anymore. That’s not to say though that every project must use all of these technologies just because they are available. But it could use any of them or all combined.

InfoQ: What are the emerging trends in combining the NoSQL data stores?

Jans: From the perspective of a Semantic Web - Graph database vendor what we see is that nearly all graph databases now perform their text indexing with Lucene based indexing (Solr or Elastic Search) and I wouldn’t be surprised that most vendors soon will allow JSON objects as first class objects for graph databases. It was surprisingly straightforward to mix the JSON and triple/graph paradigm. We are also experimenting with key-value stores to see how that mixes with the triple/graph paradigm.

This topic was also discussed during my NoSQL Applications panel, but due to a panel time constraints we couldn’t reach a conclusion. But it’s definitely an interesting perspective.

Original title and link: NoSQL Databases Best Practices and Emerging Trends (NoSQL database©myNoSQL)


MapReduce vs Parallel DBMS: Where Does Map Reduce Shine

From Jim Kaskade’s great post about MapReduce’s advantages:

One of the big attractive qualities of the MR programming model (and maybe it’s key attraction to the new generation of data scientists and application programmers) is its simplicity; an MR program consists of only two functions – Map and Reduce – written to process key/value data pairs. Therefore, the model is easy to use, even for programmers without experience with parallel and distributed systems.

It also hides the details of parallelization, fault-tolerance, locality optimization, and load balancing.

Original title and link: MapReduce vs Parallel DBMS: Where Does Map Reduce Shine (NoSQL database©myNoSQL)


Distributed Caches, NoSQL Databases, and RDBMS

Greg Luck[1] following up on his article Ehcache: Distributed Cache or NoSQL Store? talks about architectural differences between distributed caches, NoSQL database, and RDBMS and where distributed caches fit:

NoSQL and RDBMS are generally on disk. Disks are mechanical devices and exhibit large latencies due to seek time as the head moves to the right track and read or write times dependent on the RPM of the disk platter. NoSQL tends to optimise disk use, for example, by only appending to logs with the disk head in place and occasionally flushing to disk. By contrast, caches are principally in memory. […] With RDBMS a cache is added to avoid these scale out difficulties. For NoSQL, scale out is built-in, so the cache will get used when lower latencies are required.

  1. Greg Luck: Founder and CTO, Ehcache  

Original title and link: Distributed Caches, NoSQL Databases, and RDBMS (NoSQL database©myNoSQL)


Apache Sqoop: What, When Where, How

The other day I’ve posted about Sqoop’s first release under Apache umbrella, so I’ve thought of providing a bit more details about where Sqoop fits in picture. I’ve embedded below 3 presentations that will answer questions like what is Sqoop, when and where to use Sqoop, how to use Sqoop.

Traditional SQL DaaS vs NewSQL

Mike Hogan (CEO ScaleDB) provides some very valid issues with traditional relational databases operating as Databases-as-a-Service:

When moving from a self-managed database—either in the cloud or on premise—to a DaaS, the “DBA-in-the-cloud” doesn’t have that visibility into the business requirements, performance requirements, development schedule, and more. This lack of visibility turns the already challenging task of hand-tuning the database into a near impossibility using traditional databases.

And these are just the most visible ones.

On the other hand, I totally agree with Markus ‘maol’ Perdrizat pointing out that NewSQL is not the only solution to these problems:

I agree with the problem positioning, but feel strongly that NewSQL is not a requirement to address the problem here, you can equally work a little services layer and put all the control into the hands of the user, essentially replacing (a lot of) the DBA tasks with automation and APIs.

What NewSQL gives you though, and we see that with Xeround and supposedly also ScaleDB, is the elasticity and transparent sharding that’s difficult to achieve with the more traditional Oracle, Sybase or SQL Server databases that are still often required in the enterprise space.

Original title and link: Traditional SQL DaaS vs NewSQL (NoSQL database©myNoSQL)


Is MongoDB a Good Alternative to RDBMs Databases?

Gijs Mollema summarizes the lessons learned after attending Brendan McAdams’ MongoDB workshop at Devoxx—embedded below:

I have to say I was pleasantly surprised by the ease of use and the features of this product. […] Of course, using a NoSQL technology like MongoDB involves some trade-offs and a different mindset than the traditional RDBMS. The main advantages as mentioned before are flexibility, scalability and performance. As the noSQL principle looks promising it is not (yet) the holy grail and therefor currently cannot replace the RDBMs for each situation. It is a different type of database which can be a solution, based on the requirements of the situation. It will not replace RDBMs databases but it I reckon it might run well side-by-side in the future (delegating model / functionality at which MongoDB is good at).

The only thing that made me wonder is how having no sql or hibernate queries (complex joins) could be seen as an advantage?

For reference, below’s Brendan McAdams’ presentation:


Graph Databases and the World Wide Web

Sir Tim Berners-Lee:

Inventing the World Wide Web involved my growing realization that there was a power in arranging ideas in an unconstrained, web-like way.  And that awareness came to me through precisely that kind of process.

Let’s think how the different data models require us to arrange data:

  1. hierarchical model: free form, single-type of relationship (parent-child)
  2. relational model: strict form, (limited) multiple-types of relationships
  3. document model: free form, dual relationship types: logical and hierarchical
  4. star schema: strict form, (limited) multiple-types of relationships

Now think about graph databases: free form (nodes can have any number of properties), unlimited number of uni/bi-directional relationships. So question is, why aren’t network/graph databases used more these days?

Original title and link: Graph Databases and the World Wide Web (NoSQL database©myNoSQL)

Is Nosql a Premature Optimization That’s Worse Than Death? Or the Lady Gaga of the Database World?

I was just preparing for a long trip when Michael Stonebraker created a new storm. I only caught Domas Mituzas’ sharp reply and Werner Vogel’s comment:

scaling data systems in real life has humbled me. I would not dare to criticize an architecture that holds the social graphs of 750M and works

So if you feel like watching an action movie featuring A-class actors, Todd Hoff has summarized the whole conversation paraphrazing a comment about Lady Gaga:

You know, there’s a difference between not liking someone’s music and not recognizing their talent. If€ you can’t recognize the fact that Lady GaGa is, in fact, extremely talented in many ways, then you may want to try to look at her with less of a bias. There’s plenty of artists I can’t stand, but still respect their talent.

Even if you don’t like Lada Gaga’s schtick, that is a great performance. I get the feeling a lot SQL people don’t recognize the talent of NoSQL, whereas NoSQL people are generally use the best tool for the job types who have no problem with you using SQL if that works for you.

Original title and link: Is Nosql a Premature Optimization That’s Worse Than Death? Or the Lady Gaga of the Database World? (NoSQL database©myNoSQL)

What Scales Best?

Tony Bain:

What is best?  Well that comes down to the resulting complexity, cost, performance and other trade-offs.  Trade-offs are key as there are almost always significant concessions to be made as you scale up.


So what is my point? Well I guess what I am saying is physical scalability is of course an important consideration in determining what is best. But it is only one side of the coin. What it “costs” you in terms of complexity, actual dollars, performance, flexibility, availability, consistency etc, etc are all important too. And these are often relative, what is complex for you may not be complex for someone else.

I concur—a long time ago I wrote: Complexity is a dimension of scalability.

Original title and link: What Scales Best? (NoSQL database©myNoSQL)


Time Lines and News Streams: Neo4j Is 377 Times Faster Than MySQL

In my use case neo4j outperformed MySQL by a factor of 377 ! That is more than two magnitudes). As known one part of my PhD thesis is to create a social newsstream application around my social networking site It is very obvious that a graph structure for social newsstreams are very natural: You go to a user. Travers to all his friends or objects of interest and then traverse one step deeper to the newly created content items. A problem with this kind of application is the sorting by Time or relvance of the content items. But before I discuss those problems I just want to present another comparission between MySQL and neo4j.

This is wrong on so many levels. Scratch that. It’s even worse than an apples-to-oranges comparison.

Original title and link: Time Lines and News Streams: Neo4j Is 377 Times Faster Than MySQL (NoSQL database©myNoSQL)


Comments on Urban Myths About NoSQL

Dan Weinreb comments on Michael Stonebraker’s Urban Myths about SQL (PDF) :

Dr. Michael Stonebraker recently posted a presentation entitled “Urban Myths about NoSQL”. Its primary point is to defend SQL, i.e. relational, database systems against the claims of the new “NoSQL” data stores. Dr. Stonebraker is one of the original inventors of relational database technology, and has been one of the most eminent database researchers and practitioners for decades.

In fact, Michael Stonebraker bashes everything that is not his current product—this GigaOm interview is the latest example.

For now, I’m filing this away until VoltDB is sold.

Original title and link: Comments on Urban Myths About NoSQL (NoSQL database©myNoSQL)


The NoSQL Fad

Adam D’Angelo[1]:

I think the “NoSQL” fad will end when someone finally implements a distributed relational database with relaxed semantics.

I believe that defining these relaxed semantics will actually lead to figuring out the origins of many of the NoSQL solutions—just as an example, relaxing the relational model would lead to options like the document model or the BigTable-like columnar model.

  1. Adam D’Angelo: Quora Founder  

Original title and link: The NoSQL Fad (NoSQL database©myNoSQL)