NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



aws: All content tagged as aws in NoSQL databases and polyglot persistence

Using Elastic MapReduce as a generic Hadoop cluster manager

Steve McPherson for the AWS Blog:

Despite the name Elastic MapReduce, the service goes far beyond batch- oriented processing. Clusters in EMR have a flexible and rich cluster- management framework that users can customize to run any Hadoop ecosystem application such as low-latency query engines like Hbase (with Phoenix), Impala, Spark/Shark and machine learning frameworks like Mahout. These additional components can be installed using Bootstrap Actions or Steps.

Operational simplicity is a critical aspect for the early days of many companies when large hardware investments and time are so important. Amazon is building a huge data ecosystem to convince its users to stay even afterwards (the more data you put in, the more difficult it’s to move it out later).

Original title and link: Using Elastic MapReduce as a generic Hadoop cluster manager (NoSQL database©myNoSQL)


DynamoDB Local for Desktop Development

Would you like to be able to write and test code that uses the Amazon DynamoDB API even if you have no network connection and without incurring any usage charges (AWS Free Usage Tier notwithstanding)?

Amazon is impressive in their capacity of listening and pushing out new features/tools. The Google AppEngine local SDK has been one of the friendliest tools for developing apps for the cloud. Now DynamoDB users seem to have something similar.

Original title and link: DynamoDB Local for Desktop Development (NoSQL database©myNoSQL)


Amazon Web Services support for Redis

Amazon decided to expand their ElastiCache service with support for Redis:

Today, we added two important choices for customers running high performance apps in the cloud: support for Redis in Amazon ElastiCache and a new high memory database instance (db.cr1.8xlarge) for Amazon RDS.

Some more details can be found in this blog post by Jeff Bar. I couldn’t find any references if this new AWS service offers features like those provided by GarantiaData: auto-scaling, auto-failover.

Original title and link: Amazon Web Services support for Redis (NoSQL database©myNoSQL)


How Safari Books Online uses Google BigQuery for BI

Looking for alternative solutions to built our dashboards and enable interactive ad-hoc querying, we played with several technologies, including Hadoop. In the end, we decided to use Google BigQuery.

Compare the original processing flow:

BigQuery processing flow

with these 2 possible alternatives and tell me if you notice any significant differences.

Alternatives to BigQuery

Original title and link: How Safari Books Online uses Google BigQuery for BI (NoSQL database©myNoSQL)


Amazon Redshift Update

A couple of interesting points from Werner Vogels’s post about Amazon Redshift’s security:

  1. Amazon Redshift has over 1000 customers and adding new ones at a rate of 100/week. I’m not familiar with customer acquisition numbers in the data warehouse space, but this doesn’t look like ParAccel, at least in its Redshift incarnation, is failing
  2. Amazon Redshift positioning: “price, performance and simplicity”. I cannot see many companies being able to compete against this triplet.
  3. Amazon has reduced the cost of read operations from DynamoDB to 1/4 to make that data more accessible to Redshift

Original title and link: Amazon Redshift Update (NoSQL database©myNoSQL)


10 questions to ask when hosting your database on AWS

Dharshan Rangegowda, founder of Scalegrid, posted a list of 10 questions that should be answered before hosting your MongoDB on AWS. But these are generic enough to extend to any database-on-AWS solution. They cover aspects like HA, backup and restore, monitoring, and basic security. If you haven’t done this before, save them as a quick check list.

✚ Just because you set up HA and backups, it doesn’t mean they’ll actually work when you need them. Test them over and over again. Make it part of your regular procedures.

Original title and link: 10 questions to ask when hosting your database on AWS (NoSQL database©myNoSQL)


MySQL in the Cloud: Discontinuing of Xeround Cloud Database Public Service

Cloud and MySQL related:

We are deeply sorry to announce that Xeround’s public cloud offering will be discontinued soon. All Xeround FREE database instances will be terminated on May 8th, and the paid plans terminated on May 15th.

This was announced on May 1st.

✚ This only means more for Amazon RDS.

Original title and link: MySQL in the Cloud: Discontinuing of Xeround Cloud Database Public Service (NoSQL database©myNoSQL)


Microsoft Azure Sales Top $1 Billion Challenging Amazon

Last week I’ve seen some Amazon Web Service’s revenue guestimates. Bloomberg posted an article about Microsoft Azure and related programs (?) revenue: $1 billion.

Interesting numbers:

  • market share: Amazon Web Services 71%, Microsoft Azure 20%
  • Azure grew 48% in the last 6 months
  • Gartner estimates the infrastructure segment of the cloud market at $6.17 billions in 2012 and growing to $30.6 billions in 2017
  • Gartner estimates total cloud market at $108.9 billions in 2012 and growing to $237.2 billions in 2017. (nb: I find this one weird as it includes online advertising and other less-cloudy-services-imo).

Amazon hasn’t given many details about the AWS platform, except 3 numbers:

  1. number of objects stored in S3. This has been doubling every year for the last 4 years
    1. Q4 2012: 1.3trillions
    2. Q3 2011: 566b
    3. Q4 2010: 262b
    4. Q4 2009: 102b
    5. Q4 2008: 40b
    6. Q4 2007: 14b
    7. Q4 2006: 2.9b
  2. number of requests per second AWS
  3. number of EMR clusters (?) spun

According to some slides from last October/November:

  1. S3 stored over 1.3 trillion objects
  2. AWS handles over 830k requests/s
  3. 3.7mil EMR clusters spun since 2010

While I don’t have any data about RDS and Dynamo, it would be great if Microsoft would release any details about Azure.

✚ If AWS has a market share of 71% and Azure 20%, that leaves Google plus others with 9%. Makes me wonder how accurate this data is.

Original title and link: Microsoft Azure Sales Top $1 Billion Challenging Amazon (NoSQL database©myNoSQL)


Amazon Web Services Annual Revenue Estimation

Over the weekend, Christopher Mims has published an article in which he derives a figure for Amazon Web Services’s annual revenue: $2.4 billions:

Amazon is famously reticent about sales figures, dribbling out clues without revealing actual numbers. But it appears the company has left enough hints to, finally, discern how much revenue it makes on its cloud computing business, known as Amazon Web Services, which provides the backbone for a growing portion of the internet: about $2.4 billion a year.

There’s no way to decompose this number into the revenue of each AWS solution. For the data space I’d be interested into:

  1. S3 revenues. This is the space Basho’s Riak CS competes into.

    After writing my first post about Riak CS, I’ve learned that in Japan, the same place where Riak CS is run by Yahoo! new cloud storage, Gemini Mobile Technologies has been offering to local ISPs a similar S3-service built on top of Cassandra.

  2. Redshift is pretty new and while I’m not aware of immediate competitors (what am I missing?), I don’t think it accounts for a significant part of this revenue. Even if some of the early users, like AirBnb, report getting very good performance and costs from it.

    Redshift is powered by ParAccell, which, over the weekend, has been acquired by Actian.

  3. Amazon Elastic MapReduce. This is another interesting space from which Microsoft wants a share with its Azure HDInsight developed in collaboration with Hortonworks.

    In this space there’s also MapR and Google Compute combination which seem to be extremely performant.

  4. Interestingly Amazon is making money also from some of the competitors of its Amazon Dynamo and RDS services. The advantage of owning the infrastructure.

Original title and link: Amazon Web Services Annual Revenue Estimation (NoSQL database©myNoSQL)

Your Hadoop in Amazon's Cloud

Adam Horwich of metabroadcast shares their experience of running a Hadoop cluster on Amazon taking advantage of availability zones, spot instances and other tricks:

Oh Hadoop, how you infuriate me with your spurious failures and endless bugs, but how fantastic you can actually be when it comes down to it. I’ve been fighting with Hadoop a lot this past year, from a Region Server domino apocalypse, to the seemingly impossible job of duplicating a cluster. […] But to make the most of what you’ve got, I’ve been researching better ways of using resources available. There’s, of course, always been the option of using Amazon’s EMR service, but we originally built our cluster before that existed as a product, and have built our services around a standardised Hadoop cluster, with local DataNodes. This blog post will be about adding in some nice EMR style features to your dedicated Hadoop cluster running in AWS.

Original title and link: Your Hadoop in Amazon’s Cloud (NoSQL database©myNoSQL)


DynamoDB One Year Later: 85% Cheaper: How Is Amazon Doing It

Werner Vogels writes about the recent price reduction of DynamoDB

DynamoDB runs on a fleet of SSD-backed storage servers that are specifically designed to support DynamoDB. This allows us to tune both our hardware and our software to ensure that the end-to-end service is both cost-efficient and highly performant. We’ve been working hard over the past year to improve storage density and bring down the costs of our underlying hardware platform. We have also made significant improvements to our software by optimizing our storage engine, replication system and various other internal components. The DynamoDB team has a mandate to keep finding ways to reduce the cost and I am glad to see them delivering in a big way. DynamoDB has also benefited from its rapid growth, which allows us to take advantage of economies of scale. As with our other services, as we’ve made advancements that allow us to reduce our costs, we are happy to pass the savings along to you.

One thought: this could be, if it isn’t already, a great sales pitch for data appliance vendors.

You can find more details about DynamoDB’s price reduction and the new reserved capacity modle on the Amazon Web Services Blog

Amazon DynamoDB Price Reduction

Original title and link: DynamoDB One Year Later: 85% Cheaper: How Is Amazon Doing It (NoSQL database©myNoSQL)


Amazon Preparing 'Disruptive' Big Data AWS Service?

Interesting speculation by The Register:

AWS already has the AWS Data Pipeline, which helps administrators schedule and shuttle data among various services, AWS Redshift for data warehousing which lets people store large quantities of data in the cloud and run queries on it, its NoSQL SSD-backed DynamoDB, and its Relational Database Service (RDS). So where does MADS fit?

The Reg’s take is that MADS will allow Amazon to build services that can net together the above components and help automate the passing of data among them. It may also become a standalone product in its own right, based on its similarities to the TransLattice and Google Spanner tech.

I almost never bet, but I’d say this could be Amazon’s Spanner.

Original title and link: Amazon Preparing ‘Disruptive’ Big Data AWS Service? (NoSQL database©myNoSQL)