Dynamo: All content tagged as Dynamo in NoSQL databases and polyglot persistence
Tuesday, 17 July 2012
Where Cassandra Really Shines
Where Cassandra REALLY shines and is often overlooked is ease of maintenance. Cassandra’s ability to bootstrap new nodes, replicate, reshard and handle down nodes (w/ hinted handoff) is almost magical. I use it in production and it works very reliably.
Sure, it’s got some cool big data stuff, but try doing any of those “maintenance” operations on other databases without ripping your hair out. For example, even bringing up a new MySQL slave is a huge pain in the ass, let alone doing something non-trivial like promoting a new master.
Reinforcing exactly what I emphasized as merits of NoSQL systems in is SQL or NoSQL better for programmers.
Original title and link: Where Cassandra Really Shines (©myNoSQL)
Monday, 16 July 2012
eBay's Cassandra Data Modeling Best Practices
Jay Patel (architect at eBay):
Our Cassandra deployment is not huge, but it’s growing at a healthy pace. In the past couple of months, we’ve deployed dozens of nodes across several small clusters spanning multiple data centers. You may ask, why multiple clusters? We isolate clusters by functional area and criticality. Use cases with similar criticality from the same functional area share the same cluster, but reside in different keyspaces.
This first post is focused on two old techniques that have been applied even with relational databases:
- model data around query patterns
- de-normalize and duplicate for read performance.
Original title and link: eBay’s Cassandra Data Modeling Best Practices (©myNoSQL)
via: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/
Thursday, 12 July 2012
From MongoDB to Cassandra: Why Atlas Platform Is Migrating
Sergio Bossa tells the story of migrating the Atlas platform from using MongoDB to Cassandra emphasizing the reasons behind their decision:
- It works on the JVM, and we have lots of in-house experience on it.
- It scales in terms of processing and storage capacity.
- Its column-based data model gives us some advanced capabilities we will talk about in a few minutes.
- Its tunable consistency levels provide greater control over high availability and consistency requirements.
As regards what made them look into a different solution:
- We need higher resiliency to faults: MongoDB provides replica sets, but we’re experiencing lots of problems with replication lags and during replica synchronization.
- We need higher scalability: MongoDB global lock and huge memory requirements aren’t already going to cope well with our growing data set.
Original title and link: From MongoDB to Cassandra: Why Atlas Platform Is Migrating (©myNoSQL)
via: http://metabroadcast.com/blog/looking-with-cassandra-into-the-future-of-atlas
Thursday, 24 May 2012
Using R With Cassandra Through JDBC or Hive
A short post by Jake Luciani listing 2 R modules—RJDBC module and RCassandra—that enable using R with Cassandra through either the JDBC or Hive drivers.
This is a good example of what I meant by designing products with openness and integration in mind.
Original title and link: Using R With Cassandra Through JDBC or Hive (©myNoSQL)
via: http://www.datastax.com/dev/blog/big-analytics-with-r-cassandra-and-hive
Wednesday, 16 May 2012
Cassandra at Workware Systems: Data Model FTW
One of the stories in which the deciding factor for using Cassandra was primarily the data model and not its scalability characteristics:
We started working with relational databases, and began building things primarily with PostgreSQL at first. But dealing with the kind of data that we do, the data model just wasn’t appropriate. We started with Cassandra in the beginning to solve one problem: we needed to persist large vector data that was updated frequently from many different sources. RDBMS’s just don’t do that very well, and the performance is really terrible for fast read operations. By contrast, Cassandra stores that type of data exceptionally well and the performance is fantastic. We went on from there and just decided to store everything in Cassandra.
Original title and link: Cassandra at Workware Systems: Data Model FTW (©myNoSQL)
via: http://www.datastax.com/2012/04/the-five-minute-interview-workware-systems
Thursday, 10 May 2012
NoSQL and Relational Databases Podcast With Mathias Meyer
EngineYard’s Ines Sombra recorded a conversation with Mathias Meyer about NoSQL databases and their evolution towards more friendlier functionality, relational databases and their steps towards non-relational models, and a bit more on what polyglot persistence means.
Mathias Meyer is one of the people I could talk for days about NoSQL and databases in general with different infrastructure toppings and he has some of the most well balanced thoughts when speaking about this exciting space—see this conversation I’ve had with him in the early days of NoSQL. I strongly encourage you to download the mp3 and listen to it.
Original title and link: NoSQL and Relational Databases Podcast With Mathias Meyer (©myNoSQL)
Monday, 7 May 2012
Cassandra 1.1 Released: What’s New
There are a lot of interesting new features and improvements in the newly released Cassandra 1.1 version to cover them all here, but here’s the gist of them:
- Schema improvements
- Support for compound keys
- Concurrent schema changes
- A new version of Cassandra Query Language (CQL3) supporting compound keys and wide rows
- Better and easier tuning of the key and row caches
- Support for per-table hybrid storage —mixing SSDs and spinning disks
This DataStax’s blog entry provides links to more details about all these features and the others I haven’t enumerated above.
Original title and link: Cassandra 1.1 Released: What’s New (©myNoSQL)
Wednesday, 18 April 2012
NoSQL Releases and Announcements
Catching up after almost two weeks offline is no easy task, but I hope I’ll not miss any important events, releases, or posts. But if I do, please email me.
Cassandra 1.0.9: Maintenance Release
The complete change notes for Cassandra 1.0.9 are here:
- improve index sampling performance (CASSANDRA-4023)
- always compact away deleted hints immediately after handoff (CASSANDRA-3955)
- delete hints from dropped ColumnFamilies on handoff instead of erroring out (CASSANDRA-3975)
- add CompositeType ref to the CLI doc for create/update column family (CASSANDRA-3980)
- Avoid NPE during repair when a keyspace has no CFs (CASSANDRA-3988)
- Fix division-by-zero error on get_slice (CASSANDRA-4000)
- don’t change manifest level for cleanup, scrub, and upgradesstables operations under LeveledCompactionStrategy (CASSANDRA-3989, 4112)
- fix race leading to super columns assertion failure (CASSANDRA-3957)
- ensure that directory is selected for compaction for user-defined tasks and upgradesstables (CASSANDRA-3985)
- allow custom types in CLI’s assume command (CASSANDRA-4081)
- fix totalBytes count for parallel compactions (CASSANDRA-3758)
- fix intermittent NPE in get_slice (CASSANDRA-4095)
- remove unnecessary asserts in native code interfaces (CASSANDRA-4096)
- Fix EC2 snitch incorrectly reporting region (CASSANDRA-4026)
- Shut down thrift during decommission (CASSANDRA-4086)
-
Merged from 0.8: Fix ConcurrentModificationException in gossiper (CASSANDRA-4019)
-
Pig
- support Counter ColumnFamilies (CASSANDRA-3973)
- Composite column support (CASSANDRA-3684)
-
CQL
- fix NPE on invalid CQL delete command (CASSANDRA-3755)
- Validate blank keys in CQL to avoid assertion errors (CASSANDRA-3612)
Apache Hadoop User Impersonation vulnerability
This vulnerability discovered by Cloudera’s Aaron T. Myers affects Hadoop’s versions 0.20.203.0, 0.20.204.0, 0.20.205.0, 1.0.0 to 1.0.1, and 0.23.0 to 0.23.1 where Kerberos is enabled. Complete details available here.
CouchDB 1.2.0
This is the first important release after the start of the year CouchDB hubbub with Damien Katz and Couchbase. The new version is a major release in itself deserving its own post: CouchDB 1.2.0: Performance, Security, API, Core and Replication Improvements.
Riak 1.1.2: Stabilization release
Just a maintenance release in the Riak 1.1 series. Complete release notes here.
Original title and link: NoSQL Releases and Announcements (©myNoSQL)
Tuesday, 3 April 2012
Here Is Why in Cassandra vs. HBase, Riak, CouchDB, MongoDB, It's Cassandra FTW
Brian ONeill:
Now, since choosing Cassandra, I can say there are a few other really important less tangible considerations. The first, is the code base. Cassandra has an extremely clean and well maintained code base. Jonathan and team do a fantastic job managing the community and the code. As we adopted NoSQL, the ability to extend the code-base and incorporate our own features has proven invaluable. (e.g. triggers, a REST interface, and server-side wide-row indexing)
Secondly, the community is phenomenal. That results in timely support, and solid releases on a regular schedule. They do a great job prioritizing features, accepting contributions, and cranking out features. (They are now releasing ~quarterly) We’ve all probably been part of other open source projects where the leadership is lacking, and features and releases are unpredictable, which makes your own release planning difficult. Kudos to the Cassandra team.
Everything sounds reasonable except for Riak being the “new kid on the block” and not finding support for it. Basho, where were you hidding?
Original title and link: Here Is Why in Cassandra vs. HBase, Riak, CouchDB, MongoDB, It’s Cassandra FTW (©myNoSQL)
via: http://brianoneill.blogspot.com/2012/04/cassandra-vs-couchdb-mongodb-riak-hbase.html
Monday, 2 April 2012
Cassandra: How to Upgrade an Early Cassandra Cluster -
The Scandit team shares their Cassandra upgrade process from 0.6.x to latest 1.0.x:
After extensive testing, we found that it fit our needs and decided to use the 0.6.0 release for our first roll out. Over the next 12 months, we kept upgrading our cluster until we reached 0.6.13, which was the last release in the 0.6.x branch.
In the meantime, Cassandra was evolving at an amazing speed. Many cool new features, such as secondary indices, CQL and schema support were added. Since we were very happy with our deployment, we moved a little slower and skip the 0.7.x releases. Now that 1.0.x has been around for a few months, we decided it was time to upgrade. Because the list of changes between the two versions was fairly long, we did the upgrade in two steps: First from 0.6.13 to 0.8.7 and then from 0.8.7 to 1.0.8.
Original title and link: Cassandra: How to Upgrade an Early Cassandra Cluster - (©myNoSQL)
via: http://www.scandit.com/2012/03/29/tech-how-to-upgrade-path-for-an-early-cassandra-cluster/
Tuesday, 27 March 2012
NoSQL Databases Adoption in Numbers
Source of data is Jaspersoft NoSQL connectors downloads. RedMonk published a graphic and an analysis and Klint Finley followed up with job trends:

Couple of things I don’t see mentioned in the RedMonk post:
-
if and how data has been normalized based on each connector availability
According to the post data has been collected between Jan.2011-Mar.2012 and I think that not all connectors have been available since the beginning of the period.
-
if and how marketing pushes for each connectors have been weighed in
Announcing the Hadoop connector at an event with 2000 attendees or the MongoDB connector at an event with 800 attendeed could definitely influence the results (nb: keep in mind that the largest number is less than 7000, thus 200-500 downloads triggered by such an event have a significant impact)
-
Redis and VoltDB are mostly OLTP only databases
Original title and link: NoSQL Databases Adoption in Numbers (©myNoSQL)
Wednesday, 21 March 2012
Which NoSQL Databases Are Robust to Net-Splits?
- Dynamo (key-value)
- Voldemort (key-value)
- Tokyo Cabinet (key-value)
- KAI (key-value)
- Cassandra (column-oriented/tabular)
- CouchDB (document-oriented)
- SimpleDB (document-oriented)
- Riak (document-oriented)
A couple of clarifications to the list above:
- Dynamo has never been available to the public. On the other hand DynamoDB is not exactly Dynamo
- Tokyo Cabinet is not a distributed database so it shouldn’t be in this list
- CouchDB isn’t a distributed database either, but one could argue that with its peer-to-peer replication it sits right at the border. On the other hand there’s BigCouch.
Original title and link: Which NoSQL Databases Are Robust to Net-Splits? (©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling