NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



SSD: All content tagged as SSD in NoSQL databases and polyglot persistence

SSDs and MapReduce performance

Conclusions of comparing SSDs and HDDs for different cluster scenarios from the cost perspective of performance and storage capacity:

  • For a new cluster, SSDs deliver up to 70 percent higher MapReduce performance compared to HDDs of equal aggregate IO bandwidth.
  • For an existing HDD cluster, adding SSDs lead to more gains if configured properly.
  • On average, SSDs show 2.5x higher cost-per-performance, a gap far narrower than the 50x difference in cost-per-capacity.

The post offers many details of the tests run and also various results. But the 3 bullets above should be enough to drive your decision.

Original title and link: SSDs and MapReduce performance (NoSQL database©myNoSQL)


Amazon EBS, SSD, and Rackspace IOPS Per Dollar

Staying on the subject of IOPS in the cloud, Jeff Darcy did some testing with GlusterFS against Amazon EBS, Amazon SSD, Storm on Demand SS, and Rackspace instance storage and computed for each IOPS/$:

  • Amazon EBS: 1000 IOPS (provisioned) for $225/month or 4.4 IOPS/$ (server not included)
  • Amazon SSD: 4300 IOPS for $4464/month or 1.0 IOPS/month (that’s pathetic)
  • Storm on Demand SSD: 5500 IOPS for $590/month or 9.3 IOPS/$
  • Rackspace instance storage: 3400 IOPS for $692/month (8GB instances) or 4.9 IOPS/$
  • Rackspace with 4x block storage per server: 9600 IOPS for $811/month or 11.8 IOPS/$ (hypothetical, assuming CPU or network don’t become bottlenecks)

Original title and link: Amazon EBS, SSD, and Rackspace IOPS Per Dollar (NoSQL database©myNoSQL)


Voldemort on Solid State Drives

LinkedIn’s experience of upgrading their Project Voldemort clusters to using SSD:

At the beginning of this year, we migrated our Voldemort clusters to SSD (Solid State Drives) from SAS (Serial Attached SCSI) disks, to meet increasing demand for IOPS from data intensive applications.

LinkedIn Voldemort SSD

Original title and link: Voldemort on Solid State Drives (NoSQL database©myNoSQL)


Log-Structured File Systems: There's One in Every SSD

An article from 2009 by Valerie Aurora:

When you say “log-structured file system,” most storage developers will immediately think of Ousterhout and Rosenblum’s classic paper, The Design and Implementation of a Log-structured File System - and the nearly two decades of subsequent work attempting to solve the nasty segment cleaner problem (see below) that came with it. Linux developers might think of JFFS2, NILFS, or LogFS, three of several modern log-structured file systems specialized for use with solid state devices (SSDs). Few people, however, will think of SSD firmware. The flash translation layer in a modern, full-featured SSD resembles a log-structured file system in several important ways. Extrapolating from log-structured file systems research lets us predict how to get the best performance out of an SSD. In particular, full support for the TRIM command, at both the SSD and file system levels, will be key for sustaining long-term peak performance for most SSDs.

Original title and link: Log-Structured File Systems: There’s One in Every SSD (NoSQL database©myNoSQL)


EC2 Solid State Disks and Cassandra

Jonathan Ellis about using Cassandra with mixed spinning disks and SSDs:

Finally, I should point out that taking advantage of SSDs in a Cassandra cluster doesn’t have to be all or nothing. You can mix SSD and spinning disks either at the individual node level, or at the cluster level. For the former, Cassandra allows putting “hot” tables on SSD while leaving “cold” ones on spinning disks. But if you want to use a group of nodes for analytical workloads the way DataStax Enterprise does, Cassandra will also be comfortable with having just those nodes be entirely based on cheaper spinning disks, with the remaining, “realtime” nodes based on SSDs. This latter configuration is a good fit for EC2 deployments.

Original title and link: EC2 Solid State Disks and Cassandra (NoSQL database©myNoSQL)


Cassandra and Solid State Drives

A slide deck by Rick Branson explaining why and how Cassandra takes full advantage of SSDs.

Amazon Introduces High I/O SSD-backed EC2 Instances

Jeff Barr:

In order to meet this need, we are introducing a new family of EC2 instances1 that are designed to run low-latency, I/O-intensive applications, and are an exceptionally good host for NoSQL databases such as Cassandra and MongoDB.

Many complains about running databases on EC2 instances were about the I/O. I guess Amazon has been hearing this loud and clear.

  1. Specs of the new EC2 instace: 

    • 8 virtual cores (35 ECU)
    • HVM and PVM virtualization.
    • 60.5 GB of RAM.
    • 10 Gigabit Ethernet connectivity with support for cluster placement groups.
    • 2 TB of local SSD-backed storage, visible as a pair of 1 TB volumes.

Original title and link: Amazon Introduces High I/O SSD-backed EC2 Instances (NoSQL database©myNoSQL)


Benchmarking High Performance I/O With SSD for Cassandra on AWS

Adrian Cockcroft:

The SSD based system running the same workload had plenty of IOPS left over and could also run compaction operations under full load without affecting response times. The overall throughput of the 12-instance SSD based system was CPU limited to about 20% less than the existing system, but with much lower mean and 99th percentile latency. This sizing exercise indicated that we could replace the 48 m2.4xlarge and 36 m2.xlarge with 15 hi1.4xlarge to get the same throughput, but with much lower latency.

Tons of details and data about the benchmarks Netflix ran against the new high I/O SSD-backed EC2 instances. Results are even more impressive than the IOPS numbers in Werner Vogel’s High performance I/O instances for EC2.

Original title and link: Benchmarking High Performance I/O With SSD for Cassandra on AWS (NoSQL database©myNoSQL)


High Performance I/O Instances for Amazon EC2

Werner Vogels:

Databases are one particular area that for scaling can benefit tremendously from high performance I/O. The I/O requirements of database engines, regardless whether they a Relational or Non-Relation (NoSQL) DBMS’s can be very demanding. Increasingly randomized access, and burst IO through aggregation put strains on any IO subsystem, physical or virtual, attached or remote. One area where we have seen this particularly culminate is in modern NoSQL DBMSs that are often the core of scalable modern web applications that exhibit a great deal of random access patterns. They require high replication factors to get to the aggregate random IO they require. Early users of these High I/O instances have been able to reduce their replication factors significantly while achieving rock solid performance and substantially reducing their cost in the process.

Going from around 100 IOPS for 15K RPM spinning disks to over 100000 IOPS for random reads and 10000-85000 for random writes with SSDs.

Original title and link: High Performance I/O Instances for Amazon EC2 (NoSQL database©myNoSQL)


SSD vs Spinning Disk Benchmark With Bonnie

Tim Bray published the results of running Bonnie, a filesystem benchmark, against an SSD and a spinning disk both mounted on a MacBook Pro:

SSD vs Spinning Disk Benchmark

Now keep in mind that this is not a benchmark for raw speed, but rather a comparison of the file system and bus and disk access.

Original title and link: SSD vs Spinning Disk Benchmark With Bonnie (NoSQL database©myNoSQL)


Amazon’s DynamoDB Shows Hardware as Means to an End... Actually It's All About Predictability

Derrick Harris:

In that sense, DynamoDB is something of a curveball. It lets AWS users leverage the performance of SSDs, only as the underpinning of a new service rather than as a new IaaS feature alone.


Web developers use NoSQL databases more frequently than enterprise developers, and NoSQL requires solid-state performance.

I think Derrick got this mostly wrong this time. Developers do not care about SSDs per se. What good developers care about is performance. And great developers care about predictability of performance.

There are a couple of NoSQL databases that know this very well. To give you just a couple of examples, take a look at this benchmark of Riak and see what is it focusing on. Or check Riak’s Bitcask backend—here’s also a great explanation of the Bitcask paper—which guarantees a single disk seek per read. I assume you guessed the keyword behind both of these: predictability.

Amazon DynamoDB is using SSDs because:

  • it wants to offer predictable low latency
  • it wants to offer predictable throughput
  • it wants to offer single-digit millisecond average service-side responses
  • and it wants to do all these at any scale of dataset sizes and request rates

Hardware is a means to an end. And SSD or not, the aboves are all that matter[1].

  1. There are other dimensions of systems that are as critical as the ones covered (e.g. availability, fault-tolerance, etc.), but these are less related to the SSD vs spinning-disks discussion.  

Original title and link: Amazon’s DynamoDB Shows Hardware as Means to an End… Actually It’s All About Predictability (NoSQL database©myNoSQL)


CouchDB's File Format Is Brilliantly Simple and Speed-Efficient at the Cost of Disk Space

Riyad Kalla:

I have been reading up on log structured file systems, efficient data formats, database storage engines and copy-on-write semantics for a little more than week now… reading about the pros and cons of different approaches and seeing it all come together so smoothly in a single design like Couch’s really deserves a hat-tip to the Couch team.

Great post looking at the pros of CouchDB storage format and the tradeoffs the team made on the way.

Original title and link: CouchDB’s File Format Is Brilliantly Simple and Speed-Efficient at the Cost of Disk Space (NoSQL database©myNoSQL)