Sam Overton and Tom Wilkie of Acunu explain the advantages of using virtual nodes in distributed data storage engines and the performance they’ve measure introducing virtual nodes in Acunu platform when compared with Apache Cassandra:
One of the factors that limits the amount of data that can be stored on each node is the amount of time it takes to re-replicate that data when a node fails. That time matters, because it is a period during which the cluster is more vulnerable than normal to data loss. The challenge is that the more data stored on a node, the longer it takes to re-replicate it. Therefore, to store more data per node safely, we want to reduce the time taken to return to normal. This was one of our aims with virtual nodes.
Virtual Nodes reduces the time taken to re-replicate data as it involves every node in the cluster in the operation. In contrast, Apache Cassandra v1.1 will only involve a number of nodes equal to the Replication Factor (RF) of your keyspace. What’s more, with Virtual Nodes, the cluster remains balanced after this operation - you do not need to shuffle the tokens on the other nodes to compensate for the loss!
Original title and link: The Benefits of Virtual Nodes and Performance Results ( ©myNoSQL)
I expect not much in the short term, until some key issues are solved. First, the sources and consumers of data are still on-site. These guys are tackling a specific technical limitation, not necessarily looking to re-architect their wider systems, which are often complex and inter-dependent. Second, security and regulatory concerns may need addressing. Third, the TCO needs to stack up. A quick and dirty back of the envelope calculation suggests that although it’s free to get started with DynamoDB, for the sort of deployment sizes we’re seeing, DynamoDB works out considerably more expensive than alternatives like Acunu deployed on hardware (even after accounting for typical full costing for outsourced data centers).
Keeping my eyes on all things NoSQL for more than 2 years, I’d say that NoSQL databases in general do not mean much for the enterprise world2. Yet.
Original title and link: What does DynamoDB mean for the enteprise world? ( ©myNoSQL)
Acunu guys explain one piece of their distribution of Apache Cassandra:
our product […] uses a layout known as Randomised Duplicate Allocation (RDA). In the 2-RDA mode, each block of data is duplicated, and the 2 copies allocated at random among the available devices (other schemes can use more than 2 copies, or a variable number of copies depending on the popularity of the data and space constraints).
As opposed to what’s happening in the Hadoop world, Acunu is not just repackaging Cassandra:
The Acunu Core (“Castle”) is at the heart of our distribution for Apache Cassandra. It comprises a rewrite of the Linux storage stack that offloads much of the storage work from Cassandra. […] It includes optimized OS caching and buffering schemes, new storage algorithms with direct access to disks, SSD-aware storage layout algorithms (coming in V2) and an alternative RAID scheme that rebuilds 2TB disks in 30 minutes.
Original title and link: RAID and Acunu Randomised Duplicate Allocation Disk Layout ( ©myNoSQL)