Amazon: All content tagged as Amazon in NoSQL databases and polyglot persistence
Thursday, 2 February 2012
DataStax's CEO thoughts on the NoSQL Market and Competition
Billy Bosworth1:
Personally, I have never believed that other post-relational (aka NoSQL/Hadoop) database companies were our primary competition. The brute fact of the matter is that if you put us all together, we are still not statistically relevant compared to the overall DBMS market.
I had only one real personal fear coming into this market: That I would sink a big portion of my life into something that would never take hold in the mainstream. I suspect that would be a truly awful ending for all of us in this space. But thanks to companies like Amazon and Oracle, that feels highly unlikely now, and that is a great thing.
Just to play the devil advocate for a second: Oracle won’t lose much in the NoSQL market if things don’t work out well and Amazon’s DynamoDB is part of a larger plan. But for all the NoSQL database companies it is an all-or-nothing game2.
-
Billy Bosworth: CEO DataStax ↩
-
An all-or-nothing game is not the same with a winner-takes-all game ↩
Original title and link: DataStax’s CEO thoughts on the NoSQL Market and Competition (©myNoSQL)
via: http://www.datastax.com/2012/01/my-thoughts-on-amazons-dynamodb
Get them by the data
Gavin Clarke and Chris Mellor about AWS Storage Gateway:
Once you’ve got them by the data, of course, their hearts and minds will follow, and Amazon’s using the AWS Storage Gateway beta as a sampler for the rest of its compute cloud.
The Storage Gateway is another piece, together with S3, DynamoDB, SimpleDB, Elastic MapReduce, in Amazon’s great strategical puzzle of a complete polyglot platform.
Original title and link: Get them by the data (©myNoSQL)
via: http://www.theregister.co.uk/2012/01/25/amazon_cloud_enterprise_storage/
Thursday, 26 January 2012
Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials
Adam Gray[1]:
In this article, I’ll demonstrate how EMR can be used to efficiently export DynamoDB tables to S3, import S3 data into DynamoDB, and perform sophisticated queries across tables stored in both DynamoDB and other storage services such as S3.
If you put together Amazon S3, Amazon DynamoDB, Amazon RDS, and Amazon Elastic MapReduce, you have a complete polyglot persistence solution in the cloud[2].
Original title and link: Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials (©myNoSQL)
via: http://aws.typepad.com/aws/2012/01/aws-howto-using-amazon-elastic-mapreduce-with-dynamodb.html
Wednesday, 25 January 2012
12 Hadoop Vendors to Watch in 2012
My list of 8 most interesting companies for the future of Hadoop didn’t try to include anyone having a product with the Hadoop word in it. But the list from InformationWeek does. To save you 15 clicks, here’s their list:
- Amazon Elastic MapReduce
- Cloudera
- Datameer
- EMC (with EMC Greenplum Unified Analytics Platform and EMC Data Computing Appliance)
- Hadapt
- Hortonworks
- IBM (InfoSphere BigInsights)
- Informatica (for HParser)
- Karmasphere
- MapR
- Microsoft
- Oracle
Original title and link: 12 Hadoop Vendors to Watch in 2012 (©myNoSQL)
Tuesday, 24 January 2012
A Cost Analysis of DynamoDB for Tarsnap
Tarsnap is a service offering secure online backups. Colin Percival details the costs Tarsnap would have for using Amazon DynamoDB:
For each TB of data stored, this gives me 30,000,000 blocks requiring 60,000,000 key-value pairs; these occupy 2.31 GB, but for DynamoDB pricing purposes, they count as 8.31 GB, or $8.31 per month. That’s about 2.7% of Tarsnap’s gross revenues (30 cents per GB per month); significant, but manageable. However, each of those 30,000,000 blocks need to go through log cleaning every 14 days, a process which requires a read (to check that the block hasn’t been marked as deleted) and a write (to update the map to point at the new location in S3). That’s an average rate of 25 reads and 25 writes per second, so I’d need to reserve 50 reads and 50 writes per second of DynamoDB capacity. The reads cost $0.01 per hour while the writes cost $0.05 per hour, for a total cost of $0.06 per hour — or $44 per month. That’s 14.6% of Tarsnap’s gross revenues; together with the storage cost, DynamoDB would eat up 17.3% of Tarsnap’s revenue — slightly over $0.05 from every $0.30/GB I take in.
To put it differently getting an 83.7% profit margin sounds like a good deal, but without knowing the costs of the other components (S3, EC2, data transfer) it’s difficult to conclude if this solution would remain profitable at a good margin. Anyway, an interesting aspect of this solution is that the costs of some major components of the platform (S3, DynamoDB) would scale lineary with the revenue.
Original title and link: A Cost Analysis of DynamoDB for Tarsnap (©myNoSQL)
via: http://www.daemonology.net/blog/2012-01-23-why-tarsnap-wont-use-dynamodb.html
Monday, 23 January 2012
Introducing Amazon DynamoDB Slidesdeck
An official slidedeck to introduce Amazon DynamoDB to your team. My notes about DynamoDB could be a nice addition.
Thursday, 19 January 2012
Basho: Congratulations, Amazon!
A dynamo-as-a-service offered by Amazon on their ecosystem will appeal to some. For others, the benefits of a Dynamo-inspired product that can be deployed on other public clouds, behind-the-firewall, or not on the cloud at all, will be critical.
Objective. Clear. To the point.
Original title and link: Basho: Congratulations, Amazon! (©myNoSQL)
via: http://basho.com/blog/technical/2012/01/18/Congratulations-Amazon/
Wednesday, 18 January 2012
Amazon DynamoDB: NoSQL in the Cloud
James Hamilton:
In a past blog entry, One Size Does Not Fit All, I offered a taxonomy of 4 different types of structured storage system, argued that Relational Database Management Systems are not sufficient, and walked through some of the reasons why NoSQL databases have emerged and continue to grow market share quickly. The four database categories I introduced were: 1) features-first, 2) scale-first, 3) simple structure storage, and 4) purpose-optimized stores. RDBMS own the first category.
DynamoDB targets workloads fitting into the Scale-First and Simple Structured storage categories where NoSQL database systems have been so popular over the last few years
A great post focusing on the challenges faced to implement the features that make DynamoDB, the Amazon cloud-based NoSQL database, unique.
Original title and link: Amazon DynamoDB: NoSQL in the Cloud (©myNoSQL)
via: http://perspectives.mvdirona.com/2012/01/18/AmazonDynamoDBNoSQLInTheCloud.aspx
Notes About Amazon DynamoDB
It’s been only a couple of hours since the news about Amazon DynamoDB got out. Here are my notes gathered from the Amazon DynamoDB documentation. If you found interesting bits please leave a comment and I’ll add them to the list (with attribution):
- it is not the first managed/hosted NoSQL
- it is the first managed NoSQL databases that auto-shards
Update: As pointed out in the comments, Microsoft Azure Table supports auto-sharding
- it is the first managed auto-sharding NoSQL databases that automatically reshards based on SLA (request capacity can be specified by user)
- DynamoDB says that average service-side latencies are typically single-digit milliseconds
- DynamoDB stores data on Solid State Drives (SSDs)
- DynamoDB replicates data synchronously across multiple AWS Availability Zones in an AWS Region to provide built-in high availability and data durability
-
The documentation for the write operation is confusing:
When Amazon DynamoDB returns an operation successful response to your write request, Amazon DynamoDB ensures the write is durable on multiple servers. However, it takes time for the update to propagate to all copies. That is, the data is eventually consistent, meaning that your read request immediately after a write might not show the change.
- DynamoDB is capping the throughput at both table level and account level )docs)
- DynamoDB departs (a bit) from the original Dynamo model by allowing a type of non-opaque keys (which supports querying).
There’s also a
scanoperation that allows filtering of results based on attributes’ values - DynamoDB limits the size of an item (record) to 64KB. An item size is the sum of lengths of its attribute names and values (binary and UTF-8 lengths).
- DynamoDB supports two types of primary keys:
- Hash Type Primary Key: in this case the primary key is made of one attribute, a hash value. Amazon DynamoDB builds an unordered hash index on this primary key attribute.
- Hash and Range Type Primary Key: in this case, the primary key is made of two attributes. The first attribute is the hash attribute and the second one is the range attribute. Amazon DynamoDB builds an unordered hash index on the hash primary key attribute and a sorted range index on the range primary key attribute.
-
There are two types of data types:
- scalar: number and string
- multi-value: string set and number set
Note that the multi-value data types are sets (elements are unique) and not lists
-
DynamoDB supports both eventually consistent and consistent reads
- the price of a consistent read is double the price of an eventual consistent read
- Conditional writes are supported: a write is performed iif a pre-condition is met
- DynamoDB supports atomic counters
- Pricing is based on actual write/read operations and not API calls (e.g. a query returning 100 results accounts for 100 ops and not 1 op)
- when defining tables (or updating), you also specify the capacity to be reserved in terms of reads and writes
- Units of Capacity required = Number of item ops per second x item size (rounded up to the nearest KB)
- DynamoDB divides a table’s items into multiple partitions, and distributes the data primarily based on the hash key element. The provisioned throughput associated with a table is also divided evenly among the partitions, with no sharing of provisioned throughput across partitions.
- Total provisioned throughput/partitions = throughput per partition.
- supported operations:
- table level: create, describe, list, update
- data level: put (create or update), get, batch get, update, delete, query, scan
- A query operation searches only primary key attribute values and supports a subset of comparison operators on key attribute values to refine the search process
- The
BatchGetItemoperation returns the attributes for multiple items from multiple tables using their primary keys. The maximum number of item attributes that can be retrieved for a single operation is 100. Also, the number of items retrieved is constrained by a 1 MB the size limit - the
BatchGetItemis eventually consistent, only - a
Scanoperation scans the entire table.You can specify filters to apply to the results to refine the values returned to you, after the complete scan. Amazon DynamoDB puts a 1MB limit on the scan (the limit applies before the results are filtered).
- JSON is used for sending data and for responses, but it is not used as the native storage schema
Update:
- for backups/restore, one could use the EMR integration to backup your table into S3 and restore from that to a new table
- there’s no mention of SLA. Also having in mind the Amazon RDS scheduled maintenance windows, it would be good to clarify if DynamoDB will require anything similar (I doubt that, but it should be clarified). Update: Werner Vogels confirms in the comments that indeed there are no maintenance windows (always-on)
-
Some interesting data shared by a DynamoDB beta tester
- loaded multiple terabytes
- 250k writes/s
- this throughput was maintained continuously for more than 3 days
- average read latency close to 2ms and 99th percentile 6-8ms

- no impact on other customers
- CloudWatch alarms can be used to notify that a specific threshold for throughput has been reached for a table and when it is time to add additional read or write capacity units
Any other interesting bits to be emphasized?
Original title and link: Notes About Amazon DynamoDB (©myNoSQL)
Tuesday, 10 January 2012
Partnerships in the Hadoop Market
Just a quick recap:
- Cloudera: Oracle, Dell, NetApp
- Hortonworks: Microsoft
- MapR: EMC (integration with Greenplum HD)
Amazon doesn’t partner with anyone for their Amazon Elastic Map Reduce. And IBM is walking alone with the software-only InfoSphere BigInsights.
Original title and link: Partnerships in the Hadoop Market (©myNoSQL)
Thursday, 8 September 2011
Amazon Is More Interesting Than Google
Google has been doing these sort of blog posts for years. Some engineer wites up an entry about how they are doing research using terabytes or petabytes of data. And then they end by saying you should work at Google. So nowadays, I don’t care about any of what Google does. […] MapReduce? Great, they’ve been sitting on this technology for a decade. Good for them. It doesn’t matter to me.
But the world has changed, and Google can’t seem to keep up. Amazon has become the polar opposite of Google, empowering every developer on the planet to make incredible technology. Want MapReduce? Amazon has you covered. Want to play with terabytes of data like it ain’t no thing? Check. Want to launch thousands of servers to handle a tough computation? Check, check, and check. Want to launch thousands of human brains to solve otherwise unassailable problems? No problem. Heck, want to simply send email to your users? They have that too.
I read this just hours after expressing my concerns about the awesome future of Big Data and data anlytics. For now we’re lucky there’s still an Amazon out there.
Original title and link: Amazon Is More Interesting Than Google (©myNoSQL)
via: http://www.abtinforouzandeh.com/2011/09/07/Amazon-is-More-Interesting-than-Google.html
Tuesday, 23 August 2011
Memcached in the Cloud: Amazon ElastiCache
Amazon announced today a new service Amazon ElastiCache or Memcached in the cloud. The new service is still in beta and available only in the US East (Virginia) Region.
While many will find this new service useful, it is a bit of a disappointement that Amazon took the safe route and went with pure Memcached. The only notable feature of Amazon ElastiCache is automatic failure detection and recovery. But compared with Membase (and the soon to be released Couchbase 2.0) it is missing clustering, replication, support for virtual nodes, etc. Even if advertising a push-button scaling, ElastiCache will lose cached data on adding or removing instances.
The pace at which Amazon is launching new services is indeed impressive. I’m wondering what will be the first NoSQL database that will get official Amazon support.
Original title and link: Memcached in the Cloud: Amazon ElastiCache (©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling