NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Amazon: All content tagged as Amazon in NoSQL databases and polyglot persistence

99designs: Powered by Amazon RDS, Redis, MongoDB, and Memcached

While the authoritative storage is Amazon RDS, 99designs is using Redis, MongoDB, and Memcached for transient data:

We log errors and statistics to capped collections in MongoDB, providing us with more insight into our system’s performance. Redis captures per-user information about which features are enabled at any given time; it supports our development stragegy around dark launches, soft launches and incremental feature rollouts.

It’s also worth noting the nice things they say about using Amazon RDS:

An RDS instance configured to use multiple availability zones provides master-master replication, providing crucial redundancy for our DB layer. This feature has already saved our bacon multiple times: the fail over has been smooth enough that by the time we realised anything was wrong, another master was correctly serving requests. Its rolling backups provide a means of disaster recovery. We load-balance reads across multiple slaves as a means of maintaining performance as the load on our database increases.

Original title and link: 99designs: Powered by Amazon RDS, Redis, MongoDB, and Memcached (NoSQL database©myNoSQL)


DataStax's CEO thoughts on the NoSQL Market and Competition

Billy Bosworth1:

Personally, I have never believed that other post-relational (aka NoSQL/Hadoop) database companies were our primary competition.  The brute fact of the matter is that if you put us all together, we are still not statistically relevant compared to the overall DBMS market.

I had only one real personal fear coming into this market: That I would sink a big portion of my life into something that would never take hold in the mainstream. I suspect that would be a truly awful ending for all of us in this space. But thanks to companies like Amazon and Oracle, that feels highly unlikely now, and that is a great thing.

Just to play the devil advocate for a second: Oracle won’t lose much in the NoSQL market if things don’t work out well and Amazon’s DynamoDB is part of a larger plan. But for all the NoSQL database companies it is an all-or-nothing game2.

  1. Billy Bosworth: CEO DataStax 

  2. An all-or-nothing game is not the same with a winner-takes-all game  

Original title and link: DataStax’s CEO thoughts on the NoSQL Market and Competition (NoSQL database©myNoSQL)


Get them by the data

Gavin Clarke and Chris Mellor about AWS Storage Gateway:

Once you’ve got them by the data, of course, their hearts and minds will follow, and Amazon’s using the AWS Storage Gateway beta as a sampler for the rest of its compute cloud.

The Storage Gateway is another piece, together with S3, DynamoDB, SimpleDB, Elastic MapReduce, in Amazon’s great strategical puzzle of a complete polyglot platform.

Original title and link: Get them by the data (NoSQL database©myNoSQL)


Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials

Adam Gray[1]:

In this article, I’ll demonstrate how EMR can be used to efficiently export DynamoDB tables to S3, import S3 data into DynamoDB, and perform sophisticated queries across tables stored in both DynamoDB and other storage services such as S3.

If you put together Amazon S3, Amazon DynamoDB, Amazon RDS, and Amazon Elastic MapReduce, you have a complete polyglot persistence solution in the cloud[2].

  1. Adam Gray is Product Manager on the Elastic MapReduce Team  

  2. Complete in the sense of core building blocks.  

Original title and link: Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials (NoSQL database©myNoSQL)


12 Hadoop Vendors to Watch in 2012

My list of 8 most interesting companies for the future of Hadoop didn’t try to include anyone having a product with the Hadoop word in it. But the list from InformationWeek does. To save you 15 clicks, here’s their list:

  • Amazon Elastic MapReduce
  • Cloudera
  • Datameer
  • EMC (with EMC Greenplum Unified Analytics Platform and EMC Data Computing Appliance)
  • Hadapt
  • Hortonworks
  • IBM (InfoSphere BigInsights)
  • Informatica (for HParser)
  • Karmasphere
  • MapR
  • Microsoft
  • Oracle

Original title and link: 12 Hadoop Vendors to Watch in 2012 (NoSQL database©myNoSQL)

A Cost Analysis of DynamoDB for Tarsnap

Tarsnap is a service offering secure online backups. Colin Percival details the costs Tarsnap would have for using Amazon DynamoDB:

For each TB of data stored, this gives me 30,000,000 blocks requiring 60,000,000 key-value pairs; these occupy 2.31 GB, but for DynamoDB pricing purposes, they count as 8.31 GB, or $8.31 per month. That’s about 2.7% of Tarsnap’s gross revenues (30 cents per GB per month); significant, but manageable. However, each of those 30,000,000 blocks need to go through log cleaning every 14 days, a process which requires a read (to check that the block hasn’t been marked as deleted) and a write (to update the map to point at the new location in S3). That’s an average rate of 25 reads and 25 writes per second, so I’d need to reserve 50 reads and 50 writes per second of DynamoDB capacity. The reads cost $0.01 per hour while the writes cost $0.05 per hour, for a total cost of $0.06 per hour — or $44 per month. That’s 14.6% of Tarsnap’s gross revenues; together with the storage cost, DynamoDB would eat up 17.3% of Tarsnap’s revenue — slightly over $0.05 from every $0.30/GB I take in.

To put it differently getting an 83.7% profit margin sounds like a good deal, but without knowing the costs of the other components (S3, EC2, data transfer) it’s difficult to conclude if this solution would remain profitable at a good margin. Anyway, an interesting aspect of this solution is that the costs of some major components of the platform (S3, DynamoDB) would scale lineary with the revenue.

Original title and link: A Cost Analysis of DynamoDB for Tarsnap (NoSQL database©myNoSQL)


Introducing Amazon DynamoDB Slidesdeck

An official slidedeck to introduce Amazon DynamoDB to your team. My notes about DynamoDB could be a nice addition.

Basho: Congratulations, Amazon!

A dynamo-as-a-service offered by Amazon on their ecosystem will appeal to some. For others, the benefits of a Dynamo-inspired product that can be deployed on other public clouds, behind-the-firewall, or not on the cloud at all, will be critical.

Objective. Clear. To the point.

Original title and link: Basho: Congratulations, Amazon! (NoSQL database©myNoSQL)


Amazon DynamoDB: NoSQL in the Cloud

James Hamilton:

In a past blog entry, One Size Does Not Fit All, I offered a taxonomy of 4 different types of structured storage system, argued that Relational Database Management Systems are not sufficient, and walked through some of the reasons why NoSQL databases have emerged and continue to grow market share quickly. The four database categories I introduced were: 1) features-first, 2) scale-first, 3) simple structure storage, and 4) purpose-optimized stores. RDBMS own the first category.

DynamoDB targets workloads fitting into the Scale-First and Simple Structured storage categories where NoSQL database systems have been so popular over the last few years

A great post focusing on the challenges faced to implement the features that make DynamoDB, the Amazon cloud-based NoSQL database, unique.

Original title and link: Amazon DynamoDB: NoSQL in the Cloud (NoSQL database©myNoSQL)


Notes About Amazon DynamoDB

It’s been only a couple of hours since the news about Amazon DynamoDB got out. Here are my notes gathered from the Amazon DynamoDB documentation. If you found interesting bits please leave a comment and I’ll add them to the list (with attribution):

  • it is not the first managed/hosted NoSQL
  • it is the first managed NoSQL databases that auto-shards

    Update: As pointed out in the comments, Microsoft Azure Table supports auto-sharding

  • it is the first managed auto-sharding NoSQL databases that automatically reshards based on SLA (request capacity can be specified by user)
  • DynamoDB says that average service-side latencies are typically single-digit milliseconds
  • DynamoDB stores data on Solid State Drives (SSDs)
  • DynamoDB replicates data synchronously across multiple AWS Availability Zones in an AWS Region to provide built-in high availability and data durability
  • The documentation for the write operation is confusing:

    When Amazon DynamoDB returns an operation successful response to your write request, Amazon DynamoDB ensures the write is durable on multiple servers. However, it takes time for the update to propagate to all copies. That is, the data is eventually consistent, meaning that your read request immediately after a write might not show the change.

  • DynamoDB is capping the throughput at both table level and account level )docs)
    • Jeff Barr says that this limit can be changed and DynamoDB can definitely deliver more (link)
    • Werner Vogels clarified that, similarly to other Amazon web services, these limitation (tables, throughput, etc) can be lifted by filling out a request form. (link)
  • DynamoDB departs (a bit) from the original Dynamo model by allowing a type of non-opaque keys (which supports querying).

    There’s also a scan operation that allows filtering of results based on attributes’ values

  • DynamoDB limits the size of an item (record) to 64KB. An item size is the sum of lengths of its attribute names and values (binary and UTF-8 lengths).
  • DynamoDB supports two types of primary keys:
    • Hash Type Primary Key: in this case the primary key is made of one attribute, a hash value. Amazon DynamoDB builds an unordered hash index on this primary key attribute.
    • Hash and Range Type Primary Key: in this case, the primary key is made of two attributes. The first attribute is the hash attribute and the second one is the range attribute. Amazon DynamoDB builds an unordered hash index on the hash primary key attribute and a sorted range index on the range primary key attribute.
  • There are two types of data types:

    • scalar: number and string
    • multi-value: string set and number set

    Note that the multi-value data types are sets (elements are unique) and not lists

  • DynamoDB supports both eventually consistent and consistent reads

    • the price of a consistent read is double the price of an eventual consistent read
  • Conditional writes are supported: a write is performed iif a pre-condition is met
  • DynamoDB supports atomic counters
  • Pricing is based on actual write/read operations and not API calls (e.g. a query returning 100 results accounts for 100 ops and not 1 op)
  • when defining tables (or updating), you also specify the capacity to be reserved in terms of reads and writes
    • Units of Capacity required = Number of item ops per second x item size (rounded up to the nearest KB)
    • DynamoDB divides a table’s items into multiple partitions, and distributes the data primarily based on the hash key element. The provisioned throughput associated with a table is also divided evenly among the partitions, with no sharing of provisioned throughput across partitions.
      • Total provisioned throughput/partitions = throughput per partition.
  • supported operations:
    • table level: create, describe, list, update
    • data level: put (create or update), get, batch get, update, delete, query, scan
      • A query operation searches only primary key attribute values and supports a subset of comparison operators on key attribute values to refine the search process
      • The BatchGetItem operation returns the attributes for multiple items from multiple tables using their primary keys. The maximum number of item attributes that can be retrieved for a single operation is 100. Also, the number of items retrieved is constrained by a 1 MB the size limit
      • the BatchGetItem is eventually consistent, only
      • a Scan operation scans the entire table.You can specify filters to apply to the results to refine the values returned to you, after the complete scan. Amazon DynamoDB puts a 1MB limit on the scan (the limit applies before the results are filtered).
  • JSON is used for sending data and for responses, but it is not used as the native storage schema


  • for backups/restore, one could use the EMR integration to backup your table into S3 and restore from that to a new table
  • there’s no mention of SLA. Also having in mind the Amazon RDS scheduled maintenance windows, it would be good to clarify if DynamoDB will require anything similar (I doubt that, but it should be clarified). Update: Werner Vogels confirms in the comments that indeed there are no maintenance windows (always-on)
  • Some interesting data shared by a DynamoDB beta tester
    • loaded multiple terabytes
    • 250k writes/s
    • this throughput was maintained continuously for more than 3 days
    • average read latency close to 2ms and 99th percentile 6-8ms
    • no impact on other customers
  • CloudWatch alarms can be used to notify that a specific threshold for throughput has been reached for a table and when it is time to add additional read or write capacity units

Any other interesting bits to be emphasized?

Original title and link: Notes About Amazon DynamoDB (NoSQL database©myNoSQL)

Partnerships in the Hadoop Market

Just a quick recap:

Amazon doesn’t partner with anyone for their Amazon Elastic Map Reduce. And IBM is walking alone with the software-only InfoSphere BigInsights.

Original title and link: Partnerships in the Hadoop Market (NoSQL database©myNoSQL)

Amazon Is More Interesting Than Google

Google has been doing these sort of blog posts for years. Some engineer wites up an entry about how they are doing research using terabytes or petabytes of data. And then they end by saying you should work at Google. So nowadays, I don’t care about any of what Google does. […] MapReduce? Great, they’ve been sitting on this technology for a decade. Good for them. It doesn’t matter to me.

But the world has changed, and Google can’t seem to keep up. Amazon has become the polar opposite of Google, empowering every developer on the planet to make incredible technology. Want MapReduce? Amazon has you covered. Want to play with terabytes of data like it ain’t no thing? Check. Want to launch thousands of servers to handle a tough computation? Check, check, and check. Want to launch thousands of human brains to solve otherwise unassailable problems? No problem. Heck, want to simply send email to your users? They have that too.

I read this just hours after expressing my concerns about the awesome future of Big Data and data anlytics. For now we’re lucky there’s still an Amazon out there.

Original title and link: Amazon Is More Interesting Than Google (NoSQL database©myNoSQL)