It’s been only a couple of hours since the news about Amazon DynamoDB got out. Here are my notes gathered from the Amazon DynamoDB documentation. If you found interesting bits please leave a comment and I’ll add them to the list (with attribution):
- it is not the first managed/hosted NoSQL
- it is the first managed NoSQL databases that auto-shards
Update: As pointed out in the comments, Microsoft Azure Table supports auto-sharding
- it is the first managed auto-sharding NoSQL databases that automatically reshards based on SLA (request capacity can be specified by user)
- DynamoDB says that average service-side latencies are typically single-digit milliseconds
- DynamoDB stores data on Solid State Drives (SSDs)
- DynamoDB replicates data synchronously across multiple AWS Availability Zones in an AWS Region to provide built-in high availability and data durability
The documentation for the write operation is confusing:
When Amazon DynamoDB returns an operation successful response to your write request, Amazon DynamoDB ensures the write is durable on multiple servers. However, it takes time for the update to propagate to all copies. That is, the data is eventually consistent, meaning that your read request immediately after a write might not show the change.
- DynamoDB is capping the throughput at both table level and account level )docs)
- DynamoDB departs (a bit) from the original Dynamo model by allowing a type of non-opaque keys (which supports querying).
There’s also a
scanoperation that allows filtering of results based on attributes’ values
- DynamoDB limits the size of an item (record) to 64KB. An item size is the sum of lengths of its attribute names and values (binary and UTF-8 lengths).
- DynamoDB supports two types of primary keys:
- Hash Type Primary Key: in this case the primary key is made of one attribute, a hash value. Amazon DynamoDB builds an unordered hash index on this primary key attribute.
- Hash and Range Type Primary Key: in this case, the primary key is made of two attributes. The first attribute is the hash attribute and the second one is the range attribute. Amazon DynamoDB builds an unordered hash index on the hash primary key attribute and a sorted range index on the range primary key attribute.
There are two types of data types:
- scalar: number and string
- multi-value: string set and number set
Note that the multi-value data types are sets (elements are unique) and not lists
DynamoDB supports both eventually consistent and consistent reads
- the price of a consistent read is double the price of an eventual consistent read
- Conditional writes are supported: a write is performed iif a pre-condition is met
- DynamoDB supports atomic counters
- Pricing is based on actual write/read operations and not API calls (e.g. a query returning 100 results accounts for 100 ops and not 1 op)
- when defining tables (or updating), you also specify the capacity to be reserved in terms of reads and writes
- Units of Capacity required = Number of item ops per second x item size (rounded up to the nearest KB)
- DynamoDB divides a table’s items into multiple partitions, and distributes the data primarily based on the hash key element. The provisioned throughput associated with a table is also divided evenly among the partitions, with no sharing of provisioned throughput across partitions.
- Total provisioned throughput/partitions = throughput per partition.
- supported operations:
- table level: create, describe, list, update
- data level: put (create or update), get, batch get, update, delete, query, scan
- A query operation searches only primary key attribute values and supports a subset of comparison operators on key attribute values to refine the search process
BatchGetItemoperation returns the attributes for multiple items from multiple tables using their primary keys. The maximum number of item attributes that can be retrieved for a single operation is 100. Also, the number of items retrieved is constrained by a 1 MB the size limit
BatchGetItemis eventually consistent, only
Scanoperation scans the entire table.You can specify filters to apply to the results to refine the values returned to you, after the complete scan. Amazon DynamoDB puts a 1MB limit on the scan (the limit applies before the results are filtered).
- JSON is used for sending data and for responses, but it is not used as the native storage schema
- for backups/restore, one could use the EMR integration to backup your table into S3 and restore from that to a new table
- there’s no mention of SLA. Also having in mind the Amazon RDS scheduled maintenance windows, it would be good to clarify if DynamoDB will require anything similar (I doubt that, but it should be clarified). Update: Werner Vogels confirms in the comments that indeed there are no maintenance windows (always-on)
Some interesting data shared by a DynamoDB beta tester
- loaded multiple terabytes
- 250k writes/s
- this throughput was maintained continuously for more than 3 days
- average read latency close to 2ms and 99th percentile 6-8ms
- no impact on other customers
- CloudWatch alarms can be used to notify that a specific threshold for throughput has been reached for a table and when it is time to add additional read or write capacity units
Any other interesting bits to be emphasized?
Original title and link: Notes About Amazon DynamoDB ( ©myNoSQL)