NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



dynamodb: All content tagged as dynamodb in NoSQL databases and polyglot persistence

Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials

Adam Gray[1]:

In this article, I’ll demonstrate how EMR can be used to efficiently export DynamoDB tables to S3, import S3 data into DynamoDB, and perform sophisticated queries across tables stored in both DynamoDB and other storage services such as S3.

If you put together Amazon S3, Amazon DynamoDB, Amazon RDS, and Amazon Elastic MapReduce, you have a complete polyglot persistence solution in the cloud[2].

  1. Adam Gray is Product Manager on the Elastic MapReduce Team  

  2. Complete in the sense of core building blocks.  

Original title and link: Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials (NoSQL database©myNoSQL)


A Cost Analysis of DynamoDB for Tarsnap

Tarsnap is a service offering secure online backups. Colin Percival details the costs Tarsnap would have for using Amazon DynamoDB:

For each TB of data stored, this gives me 30,000,000 blocks requiring 60,000,000 key-value pairs; these occupy 2.31 GB, but for DynamoDB pricing purposes, they count as 8.31 GB, or $8.31 per month. That’s about 2.7% of Tarsnap’s gross revenues (30 cents per GB per month); significant, but manageable. However, each of those 30,000,000 blocks need to go through log cleaning every 14 days, a process which requires a read (to check that the block hasn’t been marked as deleted) and a write (to update the map to point at the new location in S3). That’s an average rate of 25 reads and 25 writes per second, so I’d need to reserve 50 reads and 50 writes per second of DynamoDB capacity. The reads cost $0.01 per hour while the writes cost $0.05 per hour, for a total cost of $0.06 per hour — or $44 per month. That’s 14.6% of Tarsnap’s gross revenues; together with the storage cost, DynamoDB would eat up 17.3% of Tarsnap’s revenue — slightly over $0.05 from every $0.30/GB I take in.

To put it differently getting an 83.7% profit margin sounds like a good deal, but without knowing the costs of the other components (S3, EC2, data transfer) it’s difficult to conclude if this solution would remain profitable at a good margin. Anyway, an interesting aspect of this solution is that the costs of some major components of the platform (S3, DynamoDB) would scale lineary with the revenue.

Original title and link: A Cost Analysis of DynamoDB for Tarsnap (NoSQL database©myNoSQL)


Introducing Amazon DynamoDB Slidesdeck

An official slidedeck to introduce Amazon DynamoDB to your team. My notes about DynamoDB could be a nice addition.

Will Amazon DynamoDB Be a Game Changer?

A question asked by many, but for now only a few shared their thoughts on Quora. Truth is there are many ways to defining a game changer technology: disruptive, innovative, impacting existing solution providers in the same market or in related markets, etc. Amazon DynamoDB could be all or none or a bit of each of these. But if the question implies a “winner-takes-it-all” answer, Sid Anand already answered it:

In the NoSQL world, it is by no means a winner-take-all battle. Distributed Systems are about compromises.

Leaving aside this type of questions, what I think it’s more relevant is learning who will be using Amazon DynamoDB and for what.

Original title and link: Will Amazon DynamoDB Be a Game Changer? (NoSQL database©myNoSQL)

Basho: Congratulations, Amazon!

A dynamo-as-a-service offered by Amazon on their ecosystem will appeal to some. For others, the benefits of a Dynamo-inspired product that can be deployed on other public clouds, behind-the-firewall, or not on the cloud at all, will be critical.

Objective. Clear. To the point.

Original title and link: Basho: Congratulations, Amazon! (NoSQL database©myNoSQL)


Amazon DynamoDB: NoSQL in the Cloud

James Hamilton:

In a past blog entry, One Size Does Not Fit All, I offered a taxonomy of 4 different types of structured storage system, argued that Relational Database Management Systems are not sufficient, and walked through some of the reasons why NoSQL databases have emerged and continue to grow market share quickly. The four database categories I introduced were: 1) features-first, 2) scale-first, 3) simple structure storage, and 4) purpose-optimized stores. RDBMS own the first category.

DynamoDB targets workloads fitting into the Scale-First and Simple Structured storage categories where NoSQL database systems have been so popular over the last few years

A great post focusing on the challenges faced to implement the features that make DynamoDB, the Amazon cloud-based NoSQL database, unique.

Original title and link: Amazon DynamoDB: NoSQL in the Cloud (NoSQL database©myNoSQL)


Notes About Amazon DynamoDB

It’s been only a couple of hours since the news about Amazon DynamoDB got out. Here are my notes gathered from the Amazon DynamoDB documentation. If you found interesting bits please leave a comment and I’ll add them to the list (with attribution):

  • it is not the first managed/hosted NoSQL
  • it is the first managed NoSQL databases that auto-shards

    Update: As pointed out in the comments, Microsoft Azure Table supports auto-sharding

  • it is the first managed auto-sharding NoSQL databases that automatically reshards based on SLA (request capacity can be specified by user)
  • DynamoDB says that average service-side latencies are typically single-digit milliseconds
  • DynamoDB stores data on Solid State Drives (SSDs)
  • DynamoDB replicates data synchronously across multiple AWS Availability Zones in an AWS Region to provide built-in high availability and data durability
  • The documentation for the write operation is confusing:

    When Amazon DynamoDB returns an operation successful response to your write request, Amazon DynamoDB ensures the write is durable on multiple servers. However, it takes time for the update to propagate to all copies. That is, the data is eventually consistent, meaning that your read request immediately after a write might not show the change.

  • DynamoDB is capping the throughput at both table level and account level )docs)
    • Jeff Barr says that this limit can be changed and DynamoDB can definitely deliver more (link)
    • Werner Vogels clarified that, similarly to other Amazon web services, these limitation (tables, throughput, etc) can be lifted by filling out a request form. (link)
  • DynamoDB departs (a bit) from the original Dynamo model by allowing a type of non-opaque keys (which supports querying).

    There’s also a scan operation that allows filtering of results based on attributes’ values

  • DynamoDB limits the size of an item (record) to 64KB. An item size is the sum of lengths of its attribute names and values (binary and UTF-8 lengths).
  • DynamoDB supports two types of primary keys:
    • Hash Type Primary Key: in this case the primary key is made of one attribute, a hash value. Amazon DynamoDB builds an unordered hash index on this primary key attribute.
    • Hash and Range Type Primary Key: in this case, the primary key is made of two attributes. The first attribute is the hash attribute and the second one is the range attribute. Amazon DynamoDB builds an unordered hash index on the hash primary key attribute and a sorted range index on the range primary key attribute.
  • There are two types of data types:

    • scalar: number and string
    • multi-value: string set and number set

    Note that the multi-value data types are sets (elements are unique) and not lists

  • DynamoDB supports both eventually consistent and consistent reads

    • the price of a consistent read is double the price of an eventual consistent read
  • Conditional writes are supported: a write is performed iif a pre-condition is met
  • DynamoDB supports atomic counters
  • Pricing is based on actual write/read operations and not API calls (e.g. a query returning 100 results accounts for 100 ops and not 1 op)
  • when defining tables (or updating), you also specify the capacity to be reserved in terms of reads and writes
    • Units of Capacity required = Number of item ops per second x item size (rounded up to the nearest KB)
    • DynamoDB divides a table’s items into multiple partitions, and distributes the data primarily based on the hash key element. The provisioned throughput associated with a table is also divided evenly among the partitions, with no sharing of provisioned throughput across partitions.
      • Total provisioned throughput/partitions = throughput per partition.
  • supported operations:
    • table level: create, describe, list, update
    • data level: put (create or update), get, batch get, update, delete, query, scan
      • A query operation searches only primary key attribute values and supports a subset of comparison operators on key attribute values to refine the search process
      • The BatchGetItem operation returns the attributes for multiple items from multiple tables using their primary keys. The maximum number of item attributes that can be retrieved for a single operation is 100. Also, the number of items retrieved is constrained by a 1 MB the size limit
      • the BatchGetItem is eventually consistent, only
      • a Scan operation scans the entire table.You can specify filters to apply to the results to refine the values returned to you, after the complete scan. Amazon DynamoDB puts a 1MB limit on the scan (the limit applies before the results are filtered).
  • JSON is used for sending data and for responses, but it is not used as the native storage schema


  • for backups/restore, one could use the EMR integration to backup your table into S3 and restore from that to a new table
  • there’s no mention of SLA. Also having in mind the Amazon RDS scheduled maintenance windows, it would be good to clarify if DynamoDB will require anything similar (I doubt that, but it should be clarified). Update: Werner Vogels confirms in the comments that indeed there are no maintenance windows (always-on)
  • Some interesting data shared by a DynamoDB beta tester
    • loaded multiple terabytes
    • 250k writes/s
    • this throughput was maintained continuously for more than 3 days
    • average read latency close to 2ms and 99th percentile 6-8ms
    • no impact on other customers
  • CloudWatch alarms can be used to notify that a specific threshold for throughput has been reached for a table and when it is time to add additional read or write capacity units

Any other interesting bits to be emphasized?

Original title and link: Notes About Amazon DynamoDB (NoSQL database©myNoSQL)