NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Amazon: All content tagged as Amazon in NoSQL databases and polyglot persistence

The NoSQL Family Tree


Even if it includes just a handful of NoSQL databases, it’s still a nice visualization.

Original title and link: The NoSQL Family Tree (NoSQL database©myNoSQL)


Microsoft Azure Sales Top $1 Billion Challenging Amazon

Last week I’ve seen some Amazon Web Service’s revenue guestimates. Bloomberg posted an article about Microsoft Azure and related programs (?) revenue: $1 billion.

Interesting numbers:

  • market share: Amazon Web Services 71%, Microsoft Azure 20%
  • Azure grew 48% in the last 6 months
  • Gartner estimates the infrastructure segment of the cloud market at $6.17 billions in 2012 and growing to $30.6 billions in 2017
  • Gartner estimates total cloud market at $108.9 billions in 2012 and growing to $237.2 billions in 2017. (nb: I find this one weird as it includes online advertising and other less-cloudy-services-imo).

Amazon hasn’t given many details about the AWS platform, except 3 numbers:

  1. number of objects stored in S3. This has been doubling every year for the last 4 years
    1. Q4 2012: 1.3trillions
    2. Q3 2011: 566b
    3. Q4 2010: 262b
    4. Q4 2009: 102b
    5. Q4 2008: 40b
    6. Q4 2007: 14b
    7. Q4 2006: 2.9b
  2. number of requests per second AWS
  3. number of EMR clusters (?) spun

According to some slides from last October/November:

  1. S3 stored over 1.3 trillion objects
  2. AWS handles over 830k requests/s
  3. 3.7mil EMR clusters spun since 2010

While I don’t have any data about RDS and Dynamo, it would be great if Microsoft would release any details about Azure.

✚ If AWS has a market share of 71% and Azure 20%, that leaves Google plus others with 9%. Makes me wonder how accurate this data is.

Original title and link: Microsoft Azure Sales Top $1 Billion Challenging Amazon (NoSQL database©myNoSQL)


DynamoDB One Year Later: 85% Cheaper: How Is Amazon Doing It

Werner Vogels writes about the recent price reduction of DynamoDB

DynamoDB runs on a fleet of SSD-backed storage servers that are specifically designed to support DynamoDB. This allows us to tune both our hardware and our software to ensure that the end-to-end service is both cost-efficient and highly performant. We’ve been working hard over the past year to improve storage density and bring down the costs of our underlying hardware platform. We have also made significant improvements to our software by optimizing our storage engine, replication system and various other internal components. The DynamoDB team has a mandate to keep finding ways to reduce the cost and I am glad to see them delivering in a big way. DynamoDB has also benefited from its rapid growth, which allows us to take advantage of economies of scale. As with our other services, as we’ve made advancements that allow us to reduce our costs, we are happy to pass the savings along to you.

One thought: this could be, if it isn’t already, a great sales pitch for data appliance vendors.

You can find more details about DynamoDB’s price reduction and the new reserved capacity modle on the Amazon Web Services Blog

Amazon DynamoDB Price Reduction

Original title and link: DynamoDB One Year Later: 85% Cheaper: How Is Amazon Doing It (NoSQL database©myNoSQL)


Amazon Redshift - Now Broadly Available

Jeff Barr:

We announced Amazon Redshift, our fast and powerful, fully managed, petabyte-scale data warehouse service, late last year (see my earlier blog post for more info).


We’ve designed Amazon Redshift to be cost-effective, easy to use, and flexible.


  1. who is the ideal Redshift user? I assume it should be AWS users that already have data in the Amazon cloud. Otherwise I have a bit of a hard time imagining trucks carrying tons of hard drives into Amazon data centers.
  2. what happens if for some reason you decide to move your data our of Redshift? How would that work?
  3. what is the next move and counter-argument of Greenplum, Netezza, Vertica, etc. to Redshift?

Original title and link: Amazon Redshift - Now Broadly Available (NoSQL database©myNoSQL)


Hadoop Business Ecosystem as of January 2013

As I was hoping and expecting, Datameer updated the chart visualizing Hadoop’s business side ecosystem:


It shouldn’t be a surprise to anyone that the top most connected companies in the Hadoop space are Cloudera and Hortonworks. They outrank the IT industry mammoths: IBM, HP, Microsoft, Oracle, SAP, etc.

Original title and link: Hadoop Business Ecosystem as of January 2013 (NoSQL database©myNoSQL)


YCSB Benchmark Results for Cassandra, HBase, MongoDB, MySQL Cluster, and Riak

Put together by the team at Altoros Systems Inc., this time run in the Amazon EC2 and including Cassandra, HBase, MongoDB, MySQL Cluster, sharded MySQL and Riak:

After some of the results had been presented to the public, some observers said MongoDB should not be compared to other NoSQL databases because it is more targeted at working with memory directly. We certainly understand this, but the aim of this investigation is to determine the best use cases for different NoSQL products. Therefore, the databases were tested under the same conditions, regardless of their specifics.

Teaser: HBase got the best results in most of the benchmarks (with flush turned off though). And I’m not sure the setup included the latest HBase read improvements from Facebook.

Original title and link: YCSB Benchmark Results for Cassandra, HBase, MongoDB, MySQL Cluster, and Riak (NoSQL database©myNoSQL)


Provisioned IOPS for Amazon RDS

Werner Vogels:

Following the huge success of being able to provision a consistent, user-requested I/O rate for DynamoDB and Elastic Block Store (EBS), the AWS Database Services team has now released Provisioned IOPS, a new high performance storage option for the Amazon Relational Database Service (Amazon RDS). Customers can provision up to 10,000 IOPS (input/output operations per second) per database instance to help ensure that their databases can run the most stringent workloads with rock solid, consistent performance.

Amazon is the first company I know of championing guaranteed performance SLAs. Until recently most of the SLAs were referring to availability, resilience, and redundancy. But soon performance-based SLAs will become the norm for other service providers. I’d also expect appliance vendors to be asked for similar guarantees sooner than later.

Original title and link: Provisioned IOPS for Amazon RDS (NoSQL database©myNoSQL)


How Is Amazon Doing Its Glacier Storage?

Greg Linden thinking out loud about Amazon Glacier:

If that is what Amazon is doing here — and I’m guessing, but I think they noticed that a lot of needs are for memory and maybe some rapid access to disk, most disk was empty and there are long times were disk is mostly idle, so they thought, let’s sell it out in a way that doesn’t interfere with real-time work — I really love it.

Indeed an intriguing idea of how to utilize idle capacity for profit.

Original title and link: How Is Amazon Doing Its Glacier Storage? (NoSQL database©myNoSQL)


Cold Data Storage: Amazon Glacier

James Hamilton:

Cold storage is different. It’s the only product I’ve ever worked upon where the customer requirements are single dimensional. […] How can we deliver the best price per capacity now and continue to reduce it over time? The focus on price over performance, price over latency, price over bandwidth actually made the problem more interesting. With most products and services, it’s usually possible to be the best on at least some dimensions even if not on all. On cold storage, to be successful, the price per capacity target needs to be hit.

Learning something new every day.

Original title and link: Cold Data Storage: Amazon Glacier (NoSQL database©myNoSQL)


I/O Intensive Apps and Amazon Cloud Improvements: EBS Provisioned IOPS & Optimized Instance Types

James Hamilton puts in perspective the last two new I/O related features coming from Amazon: the high performance I/O EC2 instances and EBS provisioned IOPS together with EBS-optimized EC2 instances:

With the announcement today, EC2 customers now have access to two very high performance storage solutions. The first solution is the EC2 High I/O Instance type announced last week which delivers a direct attached, SSD-powered 100k IOIPS for $3.10/hour. In today’s announcement this direct attached storage solution is joined by a high-performance virtual storage solution. This new type of EBS storage allows the creation of striped storage volumes that can reliably delivery 10,000 to 20,000 IOPS across a dedicated virtual storage network.

I’ve already said it, but this confirms it once again that Amazon is addressing most of the complains of running I/O intensive applications on EC2 and EBS.

Original title and link: I/O Intensive Apps and Amazon Cloud Improvements: EBS Provisioned IOPS & Optimized Instance Types (NoSQL database©myNoSQL)


Pros and Cons of Redis-Resque and Amazon SQS

Eric Lubow published a comparison of Resque, a Redis-based queue system, and Amazon SQS:

Resque has to be run locally (meaning within your environment). And because it’s native to your architecture, it can be incredibly fast in comparison. It’s durability comes into question where even though Redis allows you to dump your data to disk under varying circumstances (say once per second) or have a master/slave architecture, ultimately you are still bound by the potential loss of a single machine (aka a single point of failure candidate). While this may only be the case until Redis Cluster is released, comparisons have been made with the tools at hand. With SQS, it is much more durable. They also have the notion of in-flight messages. This means that the message is pulled off the queue but never deleted until the delete command is sent for that message id. So if you lose your worker mid-processing of the event, that event isn’t lost for good. The message will be timed out after being in-flight for 5 minutes and then dropped back onto the available queue. While this functionality could be written into Resque, it just wasn’t part of the fundamental design.

Related to Redis durability, you should read Redis persistence demystified. The conclusion might surprise those that associate Redis is a pure in-memory solution.

Original title and link: Pros and Cons of Redis-Resque and Amazon SQS (NoSQL database©myNoSQL)


Benchmarking High Performance I/O With SSD for Cassandra on AWS

Adrian Cockcroft:

The SSD based system running the same workload had plenty of IOPS left over and could also run compaction operations under full load without affecting response times. The overall throughput of the 12-instance SSD based system was CPU limited to about 20% less than the existing system, but with much lower mean and 99th percentile latency. This sizing exercise indicated that we could replace the 48 m2.4xlarge and 36 m2.xlarge with 15 hi1.4xlarge to get the same throughput, but with much lower latency.

Tons of details and data about the benchmarks Netflix ran against the new high I/O SSD-backed EC2 instances. Results are even more impressive than the IOPS numbers in Werner Vogel’s High performance I/O instances for EC2.

Original title and link: Benchmarking High Performance I/O With SSD for Cassandra on AWS (NoSQL database©myNoSQL)