ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

aws: All content tagged as aws in NoSQL databases and polyglot persistence

MongoDB vs MySQL: A DevOps point of view

Pierre Bailet and Mathieu Poumeyrol of fotopedia (a French photo site) share their experience of operating a small MongoDB cluster since Sep.2009 compared to a MySQL cluster.

Some details about fotopedia:

  • fotopedia is 100% on AWS
  • Amazon RDS for MySQL
  • 4 nodes MongoDB cluster
  • 150mil. photo views

MongoDB advantages:

  • no alter table
  • background index creation
  • data backup & restoration
    • note: as far as I can tell MySQL is able to do the same
  • replica sets
  • hardware migration
    • note: the same procedure can be used for MySQL

Before leaving you with the slides, here is an interesting accepted trade-off:

Quietly losing seconds of writes is preferable to:

  • weekly minutes-long maintenance periods
  • minutes-long unscheduled downtime and manual failover in case of hardware failures


Thoughts on SimpleDB, DynamoDB and Cassandra

Adrian Cockcroft:

So the lesson here is that for a first step into NoSQL, we went with a hosted solution so that we didn’t have to build a team of experts to run it, and we didn’t have to decide in advance how much scale we needed. Starting again from scratch today, I would probably go with DynamoDB. It’s a low “friction” and developer friendly solution.

You can look at this in two ways: 1) a biased opinion of someone that has already betted on Amazon with the infrastructure of a multi-billion business; 2) the opinion of someone that has accumulated a ton of experience in the NoSQL space and that is successfully1 running the infrastructure of a multi-billion business on NoSQL solutions. I’d strongly suggest you to think of it as the latter.


  1. Netflix was one of the few companies that continued to operate during Amazon’s EBS major failure. 

Original title and link: Thoughts on SimpleDB, DynamoDB and Cassandra (NoSQL database©myNoSQL)

via: http://perfcap.blogspot.com/2012/01/thoughts-on-simpledb-dynamodb-and.html


NoSQL tutorials: Storing User Preference in Amazon DynamoDB using the Mobile SDKs

Just a CRUD tutorial for DynamoDB but based on a scenario that makes sense and demoing the API with two languages (Objective-C and Java):

The sample mobile application described here demonstrates how to store user preferences in Amazon DynamoDB. Because more and more people are using multiple mobile devices, connecting these devices to the cloud and storing user preferences in the cloud enables developers to provide a more uniform cross-device experience for their users.

This article shows sample code for both the iOS and Android platforms.

Original title and link: NoSQL tutorials: Storing User Preference in Amazon DynamoDB using the Mobile SDKs (NoSQL database©myNoSQL)

via: http://aws.amazon.com/articles/7439603059327617


Amazon Elastic MapReduce New Features: Metrics, Updates, VPC, and Cluster Compute Support

Starting today customers can view graphs of 23 job flow metrics within the EMR Console by selecting the Monitoring tab in the Job Flow Details page. These metrics are pushed CloudWatch every five minutes at no cost to you and include information on:

  • Job flow progress including metrics on the number of map and reduce tasks running and remaining in your job flow and the number of bytes read and written to S3 and HDFS.
  • Job flow contention including metrics on HDFS utilization, map and reduce slots open, jobs running, and the ratio between map tasks remaining and map slots.
  • Job flow health including metrics on whether your job flow is idle, if there are missing data blocks, and if there are any dead nodes.

That’s like free pr0n for operations teams.

On a different note, I’ve noticed that the Hadoop stack (Hadoop, Hive, Pig) on Amazon Elastic MapReduce is based on second to last versions, which says that extensive testing is performed on Amazon side before rolling new versions out:

Original title and link: Amazon Elastic MapReduce New Features: Metrics, Updates, VPC, and Cluster Compute Support (NoSQL database©myNoSQL)

via: http://aws.typepad.com/aws/2012/01/new-elastic-mapreduce-features-metrics-updates-vpc-and-cluster-compute-support-guest-post.html


NoSQL Tutorial: Setting Up a Hadoop Cluster with MongoDB Support on EC2

A complete and detailed guide for setting up a Hadoop cluster using MongoDB by Arten Yankov. It uses the MongoDB Hadoop adapter mongo-hadoop , which provides input and output adapters, support for InputSplits, and write-only Pig.

What is covered in the tutorial:

  • Creating an AMI with the custom settings (installed hadoop and mongo-hadoop)
  • Launching a hadoop cluster on EC2
  • Adding more nodes to the cluster
  • Running some sample jobs

Original title and link: NoSQL Tutorial: Setting Up a Hadoop Cluster with MongoDB Support on EC2 (NoSQL database©myNoSQL)

via: http://artemyankov.com/post/16717104998/how-to-set-up-a-hadoop-cluster-with-mongo-support-on


Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials

Adam Gray[1]:

In this article, I’ll demonstrate how EMR can be used to efficiently export DynamoDB tables to S3, import S3 data into DynamoDB, and perform sophisticated queries across tables stored in both DynamoDB and other storage services such as S3.

If you put together Amazon S3, Amazon DynamoDB, Amazon RDS, and Amazon Elastic MapReduce, you have a complete polyglot persistence solution in the cloud[2].


  1. Adam Gray is Product Manager on the Elastic MapReduce Team  

  2. Complete in the sense of core building blocks.  

Original title and link: Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials (NoSQL database©myNoSQL)

via: http://aws.typepad.com/aws/2012/01/aws-howto-using-amazon-elastic-mapreduce-with-dynamodb.html


A Cost Analysis of DynamoDB for Tarsnap

Tarsnap is a service offering secure online backups. Colin Percival details the costs Tarsnap would have for using Amazon DynamoDB:

For each TB of data stored, this gives me 30,000,000 blocks requiring 60,000,000 key-value pairs; these occupy 2.31 GB, but for DynamoDB pricing purposes, they count as 8.31 GB, or $8.31 per month. That’s about 2.7% of Tarsnap’s gross revenues (30 cents per GB per month); significant, but manageable. However, each of those 30,000,000 blocks need to go through log cleaning every 14 days, a process which requires a read (to check that the block hasn’t been marked as deleted) and a write (to update the map to point at the new location in S3). That’s an average rate of 25 reads and 25 writes per second, so I’d need to reserve 50 reads and 50 writes per second of DynamoDB capacity. The reads cost $0.01 per hour while the writes cost $0.05 per hour, for a total cost of $0.06 per hour — or $44 per month. That’s 14.6% of Tarsnap’s gross revenues; together with the storage cost, DynamoDB would eat up 17.3% of Tarsnap’s revenue — slightly over $0.05 from every $0.30/GB I take in.

To put it differently getting an 83.7% profit margin sounds like a good deal, but without knowing the costs of the other components (S3, EC2, data transfer) it’s difficult to conclude if this solution would remain profitable at a good margin. Anyway, an interesting aspect of this solution is that the costs of some major components of the platform (S3, DynamoDB) would scale lineary with the revenue.

Original title and link: A Cost Analysis of DynamoDB for Tarsnap (NoSQL database©myNoSQL)

via: http://www.daemonology.net/blog/2012-01-23-why-tarsnap-wont-use-dynamodb.html


Auto Scaling in the Amazon Cloud: Netflix's Approach and Lessons Learned

Another great post for today from the engineering team at Netflix:

Auto scaling is a very powerful tool, but it can also be a double-edged sword. Without the proper configuration and testing it can do more harm than good. A number of edge cases may occur when attempting to optimize or make the configuration more complex. As seen above, when configured carefully and correctly, auto scaling can increase availability while simultaneously decreasing overall costs.

Original title and link: Auto Scaling in the Amazon Cloud: Netflix’s Approach and Lessons Learned (NoSQL database©myNoSQL)

via: http://techblog.netflix.com/2012/01/auto-scaling-in-amazon-cloud.html


Mahout as a Service in Apache Whirr 0.7.0

What’s included with Whirr 0.7.0 will definitely cut down the 2-3 hours required to get Mahout up and running on Amazon. At least that’s what Frank Scholten’s post made me believe.

Original title and link: Mahout as a Service in Apache Whirr 0.7.0 (NoSQL database©myNoSQL)

via: http://www.searchworkings.org/blog/-/blogs/apache-whirr-includes-mahout-support


MongoDB and Amazon Elastic Block Storage (EBS)

The topic of running MongoDB on Amazon Web Services using Elastic Block Storage came up again among the 10 tips for running MongoDB from Engine Yard:

you should know that the performance of Amazon’s Elastic Block Storage (EBS) can be inconsistent.

Following up on that Mahesh P-Subramanya aptly added:

Indeed!  I’d actually take it a step further and say Do not use EBS in any environment where reliability and/or performance characteristics of your disk-access are important.  Or, to put it differently, asynchronous backups - OK, disk-based databases - Not So Much.  

Interestingly though, some presentations earlier this year–MongoDB in the Amazon Cloud and Running MongoDB on the Cloud—left me, and others with the impression that EBS should not be dismissed so fast.

Original title and link: MongoDB and Amazon Elastic Block Storage (EBS) (NoSQL database©myNoSQL)


Hadoop: Amazon Elastic MapReduce and Microsoft Project Isotop

This is how things are rolling these days. Microsoft talks about offerring Hadoop integration with Project Isotop in 2012, Amazon is announcing immediate availability of new beefed instances (Cluster Compute Eight Extra Large (cc2.8xlarge)) and reduced prices for some of the existing instances.

Original title and link: Hadoop: Amazon Elastic MapReduce and Microsoft Project Isotop (NoSQL database©myNoSQL)


How to Run a MapReduce Job Against Common Crawl Data Using Amazon Elastic MapReduce

Steve Salevan’s 7 step guide to setting up, compiling, deploying, and running a basic MapReduce job.

When Google unveiled its MapReduce algorithm to the world in an academic paper in 2004, it shook the very foundations of data analysis. By establishing a basic pattern for writing data analysis code that can run in parallel against huge datasets, speedy analysis of data at massive scale finally became a reality, turning many orthodox notions of data analysis on their head.

Google published the paper. Yahoo open sourced this. And Amazon is offering (unlimited) resources.

Update: The Hacker News thread where the main question answered is what other corporations are using MapReduce (besides the Internet companies). The answer is unfortunately extremely short: too many to be able to enumerate them all.

Original title and link: How to Run a MapReduce Job Against Common Crawl Data Using Amazon Elastic MapReduce (NoSQL database©myNoSQL)

via: http://www.commoncrawl.org/mapreduce-for-the-masses/