mysql: All content tagged as mysql in NoSQL databases and polyglot persistence
Over the weekend, Christopher Mims has published an article in which he derives a figure for Amazon Web Services’s annual revenue: $2.4 billions:
Amazon is famously reticent about sales figures, dribbling out clues without revealing actual numbers. But it appears the company has left enough hints to, finally, discern how much revenue it makes on its cloud computing business, known as Amazon Web Services, which provides the backbone for a growing portion of the internet: about $2.4 billion a year.
There’s no way to decompose this number into the revenue of each AWS solution. For the data space I’d be interested into:
S3 revenues. This is the space Basho’s Riak CS competes into.
After writing my first post about Riak CS, I’ve learned that in Japan, the same place where Riak CS is run by Yahoo! new cloud storage, Gemini Mobile Technologies has been offering to local ISPs a similar S3-service built on top of Cassandra.
Redshift is pretty new and while I’m not aware of immediate competitors (what am I missing?), I don’t think it accounts for a significant part of this revenue. Even if some of the early users, like AirBnb, report getting very good performance and costs from it.
Redshift is powered by ParAccell, which, over the weekend, has been acquired by Actian.
Amazon Elastic MapReduce. This is another interesting space from which Microsoft wants a share with its Azure HDInsight developed in collaboration with Hortonworks.
Interestingly Amazon is making money also from some of the competitors of its Amazon Dynamo and RDS services. The advantage of owning the infrastructure.
Original title and link: Amazon Web Services Annual Revenue Estimation ( ©myNoSQL)
I’m almost always enjoying the lessons learned-style presentations from Twitter’s people. The slides below, by Jimmy Lin and Dmitriy Ryaboy, have been used at HadoopSummit. Besides the technical and practical details, there are two things that I really like:
DJ Patil: “It’s impossible to overstress this: 80% of the work in any data project is in cleaning the data”
and then the reality check:
- Your boss says something vague
- You think very hard on how to move the needle
- Where’s the data?
- What’s in this dataset?
- What’s all the f#$#$ crap in the data?
- Clean the data
- Run some off-the-shelf data mining algorithm
- Productionize, act on the insight
- Rinse, repeat