aws: All content tagged as aws in NoSQL databases and polyglot persistence
Thursday, 2 May 2013
MySQL in the Cloud: Discontinuing of Xeround Cloud Database Public Service
Cloud and MySQL related:
We are deeply sorry to announce that Xeround’s public cloud offering will be discontinued soon. All Xeround FREE database instances will be terminated on May 8th, and the paid plans terminated on May 15th.
This was announced on May 1st.
✚ This only means more for Amazon RDS.
Original title and link: MySQL in the Cloud: Discontinuing of Xeround Cloud Database Public Service (©myNoSQL)
via: http://xeround.com/blog/2013/05/discontinuing-of-xeround-cloud-database-public-service
Microsoft Azure Sales Top $1 Billion Challenging Amazon
Last week I’ve seen some Amazon Web Service’s revenue guestimates. Bloomberg posted an article about Microsoft Azure and related programs (?) revenue: $1 billion.
Interesting numbers:
- market share: Amazon Web Services 71%, Microsoft Azure 20%
- Azure grew 48% in the last 6 months
- Gartner estimates the infrastructure segment of the cloud market at $6.17 billions in 2012 and growing to $30.6 billions in 2017
- Gartner estimates total cloud market at $108.9 billions in 2012 and growing to $237.2 billions in 2017. (nb: I find this one weird as it includes online advertising and other less-cloudy-services-imo).
Amazon hasn’t given many details about the AWS platform, except 3 numbers:
- number of objects stored in S3. This has been doubling every year for the last 4 years
- Q4 2012: 1.3trillions
- Q3 2011: 566b
- Q4 2010: 262b
- Q4 2009: 102b
- Q4 2008: 40b
- Q4 2007: 14b
- Q4 2006: 2.9b
- number of requests per second AWS
- number of EMR clusters (?) spun
According to some slides from last October/November:
- S3 stored over 1.3 trillion objects
- AWS handles over 830k requests/s
- 3.7mil EMR clusters spun since 2010
While I don’t have any data about RDS and Dynamo, it would be great if Microsoft would release any details about Azure.
✚ If AWS has a market share of 71% and Azure 20%, that leaves Google plus others with 9%. Makes me wonder how accurate this data is.
Original title and link: Microsoft Azure Sales Top $1 Billion Challenging Amazon (©myNoSQL)
Monday, 29 April 2013
Amazon Web Services Annual Revenue Estimation
Over the weekend, Christopher Mims has published an article in which he derives a figure for Amazon Web Services’s annual revenue: $2.4 billions:
Amazon is famously reticent about sales figures, dribbling out clues without revealing actual numbers. But it appears the company has left enough hints to, finally, discern how much revenue it makes on its cloud computing business, known as Amazon Web Services, which provides the backbone for a growing portion of the internet: about $2.4 billion a year.
There’s no way to decompose this number into the revenue of each AWS solution. For the data space I’d be interested into:
-
S3 revenues. This is the space Basho’s Riak CS competes into.
After writing my first post about Riak CS, I’ve learned that in Japan, the same place where Riak CS is run by Yahoo! new cloud storage, Gemini Mobile Technologies has been offering to local ISPs a similar S3-service built on top of Cassandra.
-
Redshift is pretty new and while I’m not aware of immediate competitors (what am I missing?), I don’t think it accounts for a significant part of this revenue. Even if some of the early users, like AirBnb, report getting very good performance and costs from it.
Redshift is powered by ParAccell, which, over the weekend, has been acquired by Actian.
-
Amazon Elastic MapReduce. This is another interesting space from which Microsoft wants a share with its Azure HDInsight developed in collaboration with Hortonworks.
In this space there’s also MapR and Google Compute combination which seem to be extremely performant.
-
Interestingly Amazon is making money also from some of the competitors of its Amazon Dynamo and RDS services. The advantage of owning the infrastructure.
Original title and link: Amazon Web Services Annual Revenue Estimation (©myNoSQL)
Thursday, 14 March 2013
Your Hadoop in Amazon's Cloud
Adam Horwich of metabroadcast shares their experience of running a Hadoop cluster on Amazon taking advantage of availability zones, spot instances and other tricks:
Oh Hadoop, how you infuriate me with your spurious failures and endless bugs, but how fantastic you can actually be when it comes down to it. I’ve been fighting with Hadoop a lot this past year, from a Region Server domino apocalypse, to the seemingly impossible job of duplicating a cluster. […] But to make the most of what you’ve got, I’ve been researching better ways of using resources available. There’s, of course, always been the option of using Amazon’s EMR service, but we originally built our cluster before that existed as a product, and have built our services around a standardised Hadoop cluster, with local DataNodes. This blog post will be about adding in some nice EMR style features to your dedicated Hadoop cluster running in AWS.
Original title and link: Your Hadoop in Amazon’s Cloud (©myNoSQL)
Tuesday, 12 March 2013
DynamoDB One Year Later: 85% Cheaper: How Is Amazon Doing It
Werner Vogels writes about the recent price reduction of DynamoDB
DynamoDB runs on a fleet of SSD-backed storage servers that are specifically designed to support DynamoDB. This allows us to tune both our hardware and our software to ensure that the end-to-end service is both cost-efficient and highly performant. We’ve been working hard over the past year to improve storage density and bring down the costs of our underlying hardware platform. We have also made significant improvements to our software by optimizing our storage engine, replication system and various other internal components. The DynamoDB team has a mandate to keep finding ways to reduce the cost and I am glad to see them delivering in a big way. DynamoDB has also benefited from its rapid growth, which allows us to take advantage of economies of scale. As with our other services, as we’ve made advancements that allow us to reduce our costs, we are happy to pass the savings along to you.
One thought: this could be, if it isn’t already, a great sales pitch for data appliance vendors.
You can find more details about DynamoDB’s price reduction and the new reserved capacity modle on the Amazon Web Services Blog
Original title and link: DynamoDB One Year Later: 85% Cheaper: How Is Amazon Doing It (©myNoSQL)
via: http://www.allthingsdistributed.com/2013/03/dynamodb-one-year-later.html
Wednesday, 20 February 2013
Amazon Preparing 'Disruptive' Big Data AWS Service?
Interesting speculation by The Register:
AWS already has the AWS Data Pipeline, which helps administrators schedule and shuttle data among various services, AWS Redshift for data warehousing which lets people store large quantities of data in the cloud and run queries on it, its NoSQL SSD-backed DynamoDB, and its Relational Database Service (RDS). So where does MADS fit?
The Reg’s take is that MADS will allow Amazon to build services that can net together the above components and help automate the passing of data among them. It may also become a standalone product in its own right, based on its similarities to the TransLattice and Google Spanner tech.
I almost never bet, but I’d say this could be Amazon’s Spanner.
Original title and link: Amazon Preparing ‘Disruptive’ Big Data AWS Service? (©myNoSQL)
via: http://www.theregister.co.uk/2013/02/19/amazon_new_big_data_aws_service/
Tuesday, 19 February 2013
Amazon Redshift - Now Broadly Available
Jeff Barr:
We announced Amazon Redshift, our fast and powerful, fully managed, petabyte-scale data warehouse service, late last year (see my earlier blog post for more info).
[…]
We’ve designed Amazon Redshift to be cost-effective, easy to use, and flexible.
Questions:
- who is the ideal Redshift user? I assume it should be AWS users that already have data in the Amazon cloud. Otherwise I have a bit of a hard time imagining trucks carrying tons of hard drives into Amazon data centers.
- what happens if for some reason you decide to move your data our of Redshift? How would that work?
- what is the next move and counter-argument of Greenplum, Netezza, Vertica, etc. to Redshift?
Original title and link: Amazon Redshift - Now Broadly Available (©myNoSQL)
via: http://aws.typepad.com/aws/2013/02/amazon-redshift-now-broadly-available.html
Friday, 8 February 2013
Deploying Riak on EC2 - What to Pick?
Deepak Bala sharing his recommendations for running Riak on EC2 based on his own experience:
There are a couple of problems to field when deploying Riak.
The EC2 instances that are provisioned by default change the following on restart.
- Private IP address
- Public IP address
- Private DNS
- Public DNS
- EBS instances provide stable durable storage while Ephemeral storage provides for better predictable performance at the cost of losing data on restarts.
Performance.
Original title and link: Deploying Riak on EC2 - What to Pick? (©myNoSQL)
Friday, 18 January 2013
Deep Dive Into Amazon ElastiCache
Harish Ganesan published an in-depth article about Amazon Elasticache covering:
- Connection overhead of the connection buffer per TCP client connection approach used by ElastiCache
- Possible solutions for dealing with an elastic Amazon ElastiCache cluster (nb: memcached nodes are not cluster aware)
- Auto discovery (just recently added by the AWS team as a patch to the spymemcached Java client)
- ElastiCache node types
- Memory allocation and eviction policies
Original title and link: Deep Dive Into Amazon ElastiCache (©myNoSQL)
via: http://harish11g.blogspot.in/2013/01/amazon-elasticache-memcached-internals_8.html
Wednesday, 16 January 2013
The Architecture of a Credit Card Analysis Platform: Using Project Voldemort, Elastic MapReduce, Pangool
Ivan de Prado and Pere Ferrera on HighScalability.com:
The solution we developed has an infrastructure cost of just a few thousands of dollars per month thanks to the use of the cloud (AWS), Hadoop and Voldemort.
This is one of the few projects outside LinkedIn that I know of that uses Project Voldemort. Plus the Voldemort backend storage is configured to use BerkleyDB.
Original title and link: The Architecture of a Credit Card Analysis Platform: Using Project Voldemort, Elastic MapReduce, Pangool (©myNoSQL)
Friday, 21 December 2012
The New EC2 High Storage Instance Family
The High Storage Eight Extra Large (hs1.8xlarge) instances are a great fit for applications that require high storage depth and high sequential I/O performance. Each instance includes 117 GiB of RAM, 16 virtual cores (providing 35 ECU of compute performance), and 48 TB of instance storage across 24 hard disk drives capable of delivering up to 2.4 GB per second of I/O performance.
This is local storage or ephemeral storage so from the perspective of data storages it should be used only with redundant highly available databases (e.g. Riak).
P.S.: I get the feeling Jeff Darcy will be happy reading this post.
Original title and link: The New EC2 High Storage Instance Family (©myNoSQL)
via: http://aws.typepad.com/aws/2012/12/the-new-ec2-high-storage-instance-family.html
Sunday, 2 December 2012
RiakCS Multi-Datacentre Redundancy
RiakCS, the Riak-based multi-tenant, distributed S3-compatible cloud storage solution from Basho, supports now multi-datacenter replication:
RiakCS has two data replication options for cloud administrators: full sync and real-time sync. Full sync copies data from a primary RiakCS store to a secondary site at a frequency of administrators’ choosing, though the default is six hours. The secondary data stores regularly ask the primary datastore whether anything has changed and, if it has, they will update their own data to bring it in line.
Real-time sync, meanwhile, triggers when a person requests information from a RiakCS pile of data. If they are requesting from a secondary site, the database will check with the primary to see if anything has changed and update accordingly, while if they are requesting data from the primary, there’s no wait.
The naming of the 2nd sync solution as real-time sounds strange1. I’d probably call it sync on-read.
-
My first reaction was “there’s no way Basho guys implemented a 2PC or even a Paxos algorithm for synching in real-time, so what is this???”. ↩
Original title and link: RiakCS Multi-Datacentre Redundancy (©myNoSQL)
via: http://www.zdnet.com/riak-upgrades-cloud-database-with-multi-datacentre-redundancy-7000008153/
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling

