ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

Notes About Amazon DynamoDB

It’s been only a couple of hours since the news about Amazon DynamoDB got out. Here are my notes gathered from the Amazon DynamoDB documentation. If you found interesting bits please leave a comment and I’ll add them to the list (with attribution):

  • it is not the first managed/hosted NoSQL
  • it is the first managed NoSQL databases that auto-shards

    Update: As pointed out in the comments, Microsoft Azure Table supports auto-sharding

  • it is the first managed auto-sharding NoSQL databases that automatically reshards based on SLA (request capacity can be specified by user)
  • DynamoDB says that average service-side latencies are typically single-digit milliseconds
  • DynamoDB stores data on Solid State Drives (SSDs)
  • DynamoDB replicates data synchronously across multiple AWS Availability Zones in an AWS Region to provide built-in high availability and data durability
  • The documentation for the write operation is confusing:

    When Amazon DynamoDB returns an operation successful response to your write request, Amazon DynamoDB ensures the write is durable on multiple servers. However, it takes time for the update to propagate to all copies. That is, the data is eventually consistent, meaning that your read request immediately after a write might not show the change.

  • DynamoDB is capping the throughput at both table level and account level )docs)
    • Jeff Barr says that this limit can be changed and DynamoDB can definitely deliver more (link)
    • Werner Vogels clarified that, similarly to other Amazon web services, these limitation (tables, throughput, etc) can be lifted by filling out a request form. (link)
  • DynamoDB departs (a bit) from the original Dynamo model by allowing a type of non-opaque keys (which supports querying).

    There’s also a scan operation that allows filtering of results based on attributes’ values

  • DynamoDB limits the size of an item (record) to 64KB. An item size is the sum of lengths of its attribute names and values (binary and UTF-8 lengths).
  • DynamoDB supports two types of primary keys:
    • Hash Type Primary Key: in this case the primary key is made of one attribute, a hash value. Amazon DynamoDB builds an unordered hash index on this primary key attribute.
    • Hash and Range Type Primary Key: in this case, the primary key is made of two attributes. The first attribute is the hash attribute and the second one is the range attribute. Amazon DynamoDB builds an unordered hash index on the hash primary key attribute and a sorted range index on the range primary key attribute.
  • There are two types of data types:

    • scalar: number and string
    • multi-value: string set and number set

    Note that the multi-value data types are sets (elements are unique) and not lists

  • DynamoDB supports both eventually consistent and consistent reads

    • the price of a consistent read is double the price of an eventual consistent read
  • Conditional writes are supported: a write is performed iif a pre-condition is met
  • DynamoDB supports atomic counters
  • Pricing is based on actual write/read operations and not API calls (e.g. a query returning 100 results accounts for 100 ops and not 1 op)
  • when defining tables (or updating), you also specify the capacity to be reserved in terms of reads and writes
    • Units of Capacity required = Number of item ops per second x item size (rounded up to the nearest KB)
    • DynamoDB divides a table’s items into multiple partitions, and distributes the data primarily based on the hash key element. The provisioned throughput associated with a table is also divided evenly among the partitions, with no sharing of provisioned throughput across partitions.
      • Total provisioned throughput/partitions = throughput per partition.
  • supported operations:
    • table level: create, describe, list, update
    • data level: put (create or update), get, batch get, update, delete, query, scan
      • A query operation searches only primary key attribute values and supports a subset of comparison operators on key attribute values to refine the search process
      • The BatchGetItem operation returns the attributes for multiple items from multiple tables using their primary keys. The maximum number of item attributes that can be retrieved for a single operation is 100. Also, the number of items retrieved is constrained by a 1 MB the size limit
      • the BatchGetItem is eventually consistent, only
      • a Scan operation scans the entire table.You can specify filters to apply to the results to refine the values returned to you, after the complete scan. Amazon DynamoDB puts a 1MB limit on the scan (the limit applies before the results are filtered).
  • JSON is used for sending data and for responses, but it is not used as the native storage schema

Update:

  • for backups/restore, one could use the EMR integration to backup your table into S3 and restore from that to a new table
  • there’s no mention of SLA. Also having in mind the Amazon RDS scheduled maintenance windows, it would be good to clarify if DynamoDB will require anything similar (I doubt that, but it should be clarified). Update: Werner Vogels confirms in the comments that indeed there are no maintenance windows (always-on)
  • Some interesting data shared by a DynamoDB beta tester
    • loaded multiple terabytes
    • 250k writes/s
    • this throughput was maintained continuously for more than 3 days
    • average read latency close to 2ms and 99th percentile 6-8ms
    • no impact on other customers
  • CloudWatch alarms can be used to notify that a specific threshold for throughput has been reached for a table and when it is time to add additional read or write capacity units

Any other interesting bits to be emphasized?

Original title and link: Notes About Amazon DynamoDB (NoSQL database©myNoSQL)