ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

dynamo: All content tagged as dynamo in NoSQL databases and polyglot persistence

From MongoDB to Cassandra: Why Atlas Platform Is Migrating

Sergio Bossa tells the story of migrating the Atlas platform from using MongoDB to Cassandra emphasizing the reasons behind their decision:

  • It works on the JVM, and we have lots of in-house experience on it.
  • It scales in terms of processing and storage capacity.
  • Its column-based data model gives us some advanced capabilities we will talk about in a few minutes.
  • Its tunable consistency levels provide greater control over high availability and consistency requirements.

As regards what made them look into a different solution:

  • We need higher resiliency to faults: MongoDB provides replica sets, but we’re experiencing lots of problems with replication lags and during replica synchronization.
  • We need higher scalability: MongoDB global lock and huge memory requirements aren’t already going to cope well with our growing data set.

Original title and link: From MongoDB to Cassandra: Why Atlas Platform Is Migrating (NoSQL database©myNoSQL)

via: http://metabroadcast.com/blog/looking-with-cassandra-into-the-future-of-atlas


Using R With Cassandra Through JDBC or Hive

A short post by Jake Luciani listing 2 R modules—RJDBC module and RCassandra—that enable using R with Cassandra through either the JDBC or Hive drivers.

This is a good example of what I meant by designing products with openness and integration in mind.

Original title and link: Using R With Cassandra Through JDBC or Hive (NoSQL database©myNoSQL)

via: http://www.datastax.com/dev/blog/big-analytics-with-r-cassandra-and-hive


Cassandra at Workware Systems: Data Model FTW

One of the stories in which the deciding factor for using Cassandra was primarily the data model and not its scalability characteristics:

We started working with relational databases, and began building things primarily with PostgreSQL at first.  But dealing with the kind of data that we do, the data model just wasn’t appropriate. We started with Cassandra in the beginning to solve one problem: we needed to persist large vector data that was updated frequently from many different sources. RDBMS’s just don’t do that very well, and the performance is really terrible for fast read operations. By contrast, Cassandra stores that type of data exceptionally well and the performance is fantastic. We went on from there and just decided to store everything in Cassandra.

Original title and link: Cassandra at Workware Systems: Data Model FTW (NoSQL database©myNoSQL)

via: http://www.datastax.com/2012/04/the-five-minute-interview-workware-systems


NoSQL and Relational Databases Podcast With Mathias Meyer

EngineYard’s Ines Sombra recorded a conversation with Mathias Meyer about NoSQL databases and their evolution towards more friendlier functionality, relational databases and their steps towards non-relational models, and a bit more on what polyglot persistence means.

Mathias Meyer is one of the people I could talk for days about NoSQL and databases in general with different infrastructure toppings and he has some of the most well balanced thoughts when speaking about this exciting space—see this conversation I’ve had with him in the early days of NoSQL. I strongly encourage you to download the mp3 and listen to it.

Original title and link: NoSQL and Relational Databases Podcast With Mathias Meyer (NoSQL database©myNoSQL)


Cassandra 1.1 Released: What’s New

There are a lot of interesting new features and improvements in the newly released Cassandra 1.1 version to cover them all here, but here’s the gist of them:

  1. Schema improvements
    1. Support for compound keys
    2. Concurrent schema changes
  2. A new version of Cassandra Query Language (CQL3) supporting compound keys and wide rows
  3. Better and easier tuning of the key and row caches
  4. Support for per-table hybrid storage —mixing SSDs and spinning disks

This DataStax’s blog entry provides links to more details about all these features and the others I haven’t enumerated above.

Original title and link: Cassandra 1.1 Released: What’s New (NoSQL database©myNoSQL)


NoSQL Releases and Announcements

Catching up after almost two weeks offline is no easy task, but I hope I’ll not miss any important events, releases, or posts. But if I do, please email me.

Cassandra 1.0.9: Maintenance Release

The complete change notes for Cassandra 1.0.9 are here:

  • improve index sampling performance (CASSANDRA-4023)
  • always compact away deleted hints immediately after handoff (CASSANDRA-3955)
  • delete hints from dropped ColumnFamilies on handoff instead of erroring out (CASSANDRA-3975)
  • add CompositeType ref to the CLI doc for create/update column family (CASSANDRA-3980)
  • Avoid NPE during repair when a keyspace has no CFs (CASSANDRA-3988)
  • Fix division-by-zero error on get_slice (CASSANDRA-4000)
  • don’t change manifest level for cleanup, scrub, and upgradesstables operations under LeveledCompactionStrategy (CASSANDRA-3989, 4112)
  • fix race leading to super columns assertion failure (CASSANDRA-3957)
  • ensure that directory is selected for compaction for user-defined tasks and upgradesstables (CASSANDRA-3985)
  • allow custom types in CLI’s assume command (CASSANDRA-4081)
  • fix totalBytes count for parallel compactions (CASSANDRA-3758)
  • fix intermittent NPE in get_slice (CASSANDRA-4095)
  • remove unnecessary asserts in native code interfaces (CASSANDRA-4096)
  • Fix EC2 snitch incorrectly reporting region (CASSANDRA-4026)
  • Shut down thrift during decommission (CASSANDRA-4086)
  • Merged from 0.8: Fix ConcurrentModificationException in gossiper (CASSANDRA-4019)

  • Pig

    • support Counter ColumnFamilies (CASSANDRA-3973)
    • Composite column support (CASSANDRA-3684)
  • CQL

    • fix NPE on invalid CQL delete command (CASSANDRA-3755)
    • Validate blank keys in CQL to avoid assertion errors (CASSANDRA-3612)

Apache Hadoop User Impersonation vulnerability

This vulnerability discovered by Cloudera’s Aaron T. Myers affects Hadoop’s versions 0.20.203.0, 0.20.204.0, 0.20.205.0, 1.0.0 to 1.0.1, and 0.23.0 to 0.23.1 where Kerberos is enabled. Complete details available here.

CouchDB 1.2.0

This is the first important release after the start of the year CouchDB hubbub with Damien Katz and Couchbase. The new version is a major release in itself deserving its own post: CouchDB 1.2.0: Performance, Security, API, Core and Replication Improvements.

Riak 1.1.2: Stabilization release

Just a maintenance release in the Riak 1.1 series. Complete release notes here.

Original title and link: NoSQL Releases and Announcements (NoSQL database©myNoSQL)


Here Is Why in Cassandra vs. HBase, Riak, CouchDB, MongoDB, It's Cassandra FTW

Brian ONeill:

Now, since choosing Cassandra, I can say there are a few other really important less tangible considerations. The first, is the code base. Cassandra has an extremely clean and well maintained code base. Jonathan and team do a fantastic job managing the community and the code. As we adopted NoSQL, the ability to extend the code-base and incorporate our own features has proven invaluable. (e.g. triggers, a REST interface, and server-side wide-row indexing)

Secondly, the community is phenomenal. That results in timely support, and solid releases on a regular schedule. They do a great job prioritizing features, accepting contributions, and cranking out features. (They are now releasing ~quarterly) We’ve all probably been part of other open source projects where the leadership is lacking, and features and releases are unpredictable, which makes your own release planning difficult. Kudos to the Cassandra team.

Everything sounds reasonable except for Riak being the “new kid on the block” and not finding support for it. Basho, where were you hidding?

Original title and link: Here Is Why in Cassandra vs. HBase, Riak, CouchDB, MongoDB, It’s Cassandra FTW (NoSQL database©myNoSQL)

via: http://brianoneill.blogspot.com/2012/04/cassandra-vs-couchdb-mongodb-riak-hbase.html


Cassandra: How to Upgrade an Early Cassandra Cluster -

The Scandit team shares their Cassandra upgrade process from 0.6.x to latest 1.0.x:

After extensive testing, we found that it fit our needs and decided to use the 0.6.0 release for our first roll out. Over the next 12 months, we kept upgrading our cluster until we reached 0.6.13, which was the last release in the 0.6.x branch.

In the meantime, Cassandra was evolving at an amazing speed. Many cool new features, such as secondary indices, CQL and schema support were added. Since we were very happy with our deployment, we moved a little slower and skip the 0.7.x releases. Now that 1.0.x has been around for a few months, we decided it was time to upgrade. Because the list of changes between the two versions was fairly long, we did the upgrade in two steps: First from 0.6.13 to 0.8.7 and then from 0.8.7 to 1.0.8.

Original title and link: Cassandra: How to Upgrade an Early Cassandra Cluster - (NoSQL database©myNoSQL)

via: http://www.scandit.com/2012/03/29/tech-how-to-upgrade-path-for-an-early-cassandra-cluster/


NoSQL Databases Adoption in Numbers

Source of data is Jaspersoft NoSQL connectors downloads. RedMonk published a graphic and an analysis and Klint Finley followed up with job trends:

NoSQL databases adoption

Couple of things I don’t see mentioned in the RedMonk post:

  1. if and how data has been normalized based on each connector availability

    According to the post data has been collected between Jan.2011-Mar.2012 and I think that not all connectors have been available since the beginning of the period.

  2. if and how marketing pushes for each connectors have been weighed in

    Announcing the Hadoop connector at an event with 2000 attendees or the MongoDB connector at an event with 800 attendeed could definitely influence the results (nb: keep in mind that the largest number is less than 7000, thus 200-500 downloads triggered by such an event have a significant impact)

  3. Redis and VoltDB are mostly OLTP only databases

Original title and link: NoSQL Databases Adoption in Numbers (NoSQL database©myNoSQL)


Which NoSQL Databases Are Robust to Net-Splits?

Answered on Quora:

  • Dynamo (key-value)
  • Voldemort (key-value)
  • Tokyo Cabinet (key-value)
  • KAI (key-value)
  • Cassandra (column-oriented/tabular)
  • CouchDB (document-oriented)
  • SimpleDB (document-oriented)
  • Riak (document-oriented)

A couple of clarifications to the list above:

  1. Dynamo has never been available to the public. On the other hand DynamoDB is not exactly Dynamo
  2. Tokyo Cabinet is not a distributed database so it shouldn’t be in this list
  3. CouchDB isn’t a distributed database either, but one could argue that with its peer-to-peer replication it sits right at the border. On the other hand there’s BigCouch.

Original title and link: Which NoSQL Databases Are Robust to Net-Splits? (NoSQL database©myNoSQL)


NoSQL Hosting Services

Michael Hausenblas put together a list of hosted NoSQL solutions including Amazon DynamoDB and SimpleDB, Google App Engine, Riak, Cassandra, CouchDB, MongoDB, Neo4j, and OrientDB. If you go through my posts on NoSQL hosting , you’ll find a couple more.

Original title and link: NoSQL Hosting Services (NoSQL database©myNoSQL)

via: http://webofdata.wordpress.com/2012/03/18/hosted-nosql/


A Tour of Amazon DynamoDB Features and API

Mathias Meyer’s walk through the DynamoDB features and API with commentary:

Sorted range keys, conditional updates, atomic counters, structured data and multi-valued data types, fetching and updating single attributes, strong consistency, and no explicit way to handle and resolve conflicts other than conditions. A lot of features DynamoDB has to offer remind me of everything that’s great about wide column stores like Cassandra, but even more so of HBase. This is great in my opinion, as Dynamo would probably not be well-suited for a customer-facing system. And indeed, Werner Vogel’s post on DynamoDB seems to suggest DynamoDB is a bastard child of Dynamo and SimpleDB, though with lots of sugar sprinkled on top.

Think of it as an extended, better articulated and closer to the API version of my notes about Amazon DynamoDB.

Original title and link: A Tour of Amazon DynamoDB Features and API (NoSQL database©myNoSQL)

via: http://www.paperplanes.de/2012/1/30/a-tour-of-amazons-dynamodb.html