aws: All content tagged as aws in NoSQL databases and polyglot persistence
Monday, 6 February 2012
MongoDB vs MySQL: A DevOps point of view
Pierre Bailet and Mathieu Poumeyrol of fotopedia (a French photo site) share their experience of operating a small MongoDB cluster since Sep.2009 compared to a MySQL cluster.
Some details about fotopedia:
- fotopedia is 100% on AWS
- Amazon RDS for MySQL
- 4 nodes MongoDB cluster
- 150mil. photo views
MongoDB advantages:
- no alter table
- background index creation
- data backup & restoration
- note: as far as I can tell MySQL is able to do the same
- replica sets
- hardware migration
- note: the same procedure can be used for MySQL
Before leaving you with the slides, here is an interesting accepted trade-off:
Quietly losing seconds of writes is preferable to:
- weekly minutes-long maintenance periods
- minutes-long unscheduled downtime and manual failover in case of hardware failures
Thursday, 2 February 2012
Thoughts on SimpleDB, DynamoDB and Cassandra
Adrian Cockcroft:
So the lesson here is that for a first step into NoSQL, we went with a hosted solution so that we didn’t have to build a team of experts to run it, and we didn’t have to decide in advance how much scale we needed. Starting again from scratch today, I would probably go with DynamoDB. It’s a low “friction” and developer friendly solution.
You can look at this in two ways: 1) a biased opinion of someone that has already betted on Amazon with the infrastructure of a multi-billion business; 2) the opinion of someone that has accumulated a ton of experience in the NoSQL space and that is successfully1 running the infrastructure of a multi-billion business on NoSQL solutions. I’d strongly suggest you to think of it as the latter.
-
Netflix was one of the few companies that continued to operate during Amazon’s EBS major failure. ↩
Original title and link: Thoughts on SimpleDB, DynamoDB and Cassandra (©myNoSQL)
via: http://perfcap.blogspot.com/2012/01/thoughts-on-simpledb-dynamodb-and.html
NoSQL tutorials: Storing User Preference in Amazon DynamoDB using the Mobile SDKs
Just a CRUD tutorial for DynamoDB but based on a scenario that makes sense and demoing the API with two languages (Objective-C and Java):
The sample mobile application described here demonstrates how to store user preferences in Amazon DynamoDB. Because more and more people are using multiple mobile devices, connecting these devices to the cloud and storing user preferences in the cloud enables developers to provide a more uniform cross-device experience for their users.
This article shows sample code for both the iOS and Android platforms.
Original title and link: NoSQL tutorials: Storing User Preference in Amazon DynamoDB using the Mobile SDKs (©myNoSQL)
Wednesday, 1 February 2012
Amazon Elastic MapReduce New Features: Metrics, Updates, VPC, and Cluster Compute Support
Starting today customers can view graphs of 23 job flow metrics within the EMR Console by selecting the Monitoring tab in the Job Flow Details page. These metrics are pushed CloudWatch every five minutes at no cost to you and include information on:
- Job flow progress including metrics on the number of map and reduce tasks running and remaining in your job flow and the number of bytes read and written to S3 and HDFS.
- Job flow contention including metrics on HDFS utilization, map and reduce slots open, jobs running, and the ratio between map tasks remaining and map slots.
- Job flow health including metrics on whether your job flow is idle, if there are missing data blocks, and if there are any dead nodes.
That’s like free pr0n for operations teams.
On a different note, I’ve noticed that the Hadoop stack (Hadoop, Hive, Pig) on Amazon Elastic MapReduce is based on second to last versions, which says that extensive testing is performed on Amazon side before rolling new versions out:
- Hadoop: 0.20.205 precursor of Hadoop 1.0.0 supports append and security, but doesn’t have RAID, symlinks or MR2
- Hive: 0.7.1 (precursor of latest 0.8.0)
- Pig: 0.9.1 (precursor of latest 0.9.2)
Original title and link: Amazon Elastic MapReduce New Features: Metrics, Updates, VPC, and Cluster Compute Support (©myNoSQL)
Monday, 30 January 2012
NoSQL Tutorial: Setting Up a Hadoop Cluster with MongoDB Support on EC2
A complete and detailed guide for setting up a Hadoop cluster using MongoDB by Arten Yankov. It uses the MongoDB Hadoop adapter mongo-hadoop , which provides input and output adapters, support for InputSplits, and write-only Pig.
What is covered in the tutorial:
- Creating an AMI with the custom settings (installed hadoop and mongo-hadoop)
- Launching a hadoop cluster on EC2
- Adding more nodes to the cluster
- Running some sample jobs
Original title and link: NoSQL Tutorial: Setting Up a Hadoop Cluster with MongoDB Support on EC2 (©myNoSQL)
via: http://artemyankov.com/post/16717104998/how-to-set-up-a-hadoop-cluster-with-mongo-support-on
Thursday, 26 January 2012
Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials
Adam Gray[1]:
In this article, I’ll demonstrate how EMR can be used to efficiently export DynamoDB tables to S3, import S3 data into DynamoDB, and perform sophisticated queries across tables stored in both DynamoDB and other storage services such as S3.
If you put together Amazon S3, Amazon DynamoDB, Amazon RDS, and Amazon Elastic MapReduce, you have a complete polyglot persistence solution in the cloud[2].
Original title and link: Using Amazon Elastic MapReduce With DynamoDB: NoSQL Tutorials (©myNoSQL)
via: http://aws.typepad.com/aws/2012/01/aws-howto-using-amazon-elastic-mapreduce-with-dynamodb.html
Tuesday, 24 January 2012
A Cost Analysis of DynamoDB for Tarsnap
Tarsnap is a service offering secure online backups. Colin Percival details the costs Tarsnap would have for using Amazon DynamoDB:
For each TB of data stored, this gives me 30,000,000 blocks requiring 60,000,000 key-value pairs; these occupy 2.31 GB, but for DynamoDB pricing purposes, they count as 8.31 GB, or $8.31 per month. That’s about 2.7% of Tarsnap’s gross revenues (30 cents per GB per month); significant, but manageable. However, each of those 30,000,000 blocks need to go through log cleaning every 14 days, a process which requires a read (to check that the block hasn’t been marked as deleted) and a write (to update the map to point at the new location in S3). That’s an average rate of 25 reads and 25 writes per second, so I’d need to reserve 50 reads and 50 writes per second of DynamoDB capacity. The reads cost $0.01 per hour while the writes cost $0.05 per hour, for a total cost of $0.06 per hour — or $44 per month. That’s 14.6% of Tarsnap’s gross revenues; together with the storage cost, DynamoDB would eat up 17.3% of Tarsnap’s revenue — slightly over $0.05 from every $0.30/GB I take in.
To put it differently getting an 83.7% profit margin sounds like a good deal, but without knowing the costs of the other components (S3, EC2, data transfer) it’s difficult to conclude if this solution would remain profitable at a good margin. Anyway, an interesting aspect of this solution is that the costs of some major components of the platform (S3, DynamoDB) would scale lineary with the revenue.
Original title and link: A Cost Analysis of DynamoDB for Tarsnap (©myNoSQL)
via: http://www.daemonology.net/blog/2012-01-23-why-tarsnap-wont-use-dynamodb.html
Thursday, 19 January 2012
Auto Scaling in the Amazon Cloud: Netflix's Approach and Lessons Learned
Another great post for today from the engineering team at Netflix:
Auto scaling is a very powerful tool, but it can also be a double-edged sword. Without the proper configuration and testing it can do more harm than good. A number of edge cases may occur when attempting to optimize or make the configuration more complex. As seen above, when configured carefully and correctly, auto scaling can increase availability while simultaneously decreasing overall costs.
Original title and link: Auto Scaling in the Amazon Cloud: Netflix’s Approach and Lessons Learned (©myNoSQL)
via: http://techblog.netflix.com/2012/01/auto-scaling-in-amazon-cloud.html
Tuesday, 3 January 2012
Mahout as a Service in Apache Whirr 0.7.0
What’s included with Whirr 0.7.0 will definitely cut down the 2-3 hours required to get Mahout up and running on Amazon. At least that’s what Frank Scholten’s post made me believe.
Original title and link: Mahout as a Service in Apache Whirr 0.7.0 (©myNoSQL)
via: http://www.searchworkings.org/blog/-/blogs/apache-whirr-includes-mahout-support
Thursday, 22 December 2011
MongoDB and Amazon Elastic Block Storage (EBS)
The topic of running MongoDB on Amazon Web Services using Elastic Block Storage came up again among the 10 tips for running MongoDB from Engine Yard:
you should know that the performance of Amazon’s Elastic Block Storage (EBS) can be inconsistent.
Following up on that Mahesh P-Subramanya aptly added:
Indeed! I’d actually take it a step further and say Do not use EBS in any environment where reliability and/or performance characteristics of your disk-access are important. Or, to put it differently, asynchronous backups - OK, disk-based databases - Not So Much.
Interestingly though, some presentations earlier this year–MongoDB in the Amazon Cloud and Running MongoDB on the Cloud—left me, and others with the impression that EBS should not be dismissed so fast.
Original title and link: MongoDB and Amazon Elastic Block Storage (EBS) (©myNoSQL)
Wednesday, 21 December 2011
Hadoop: Amazon Elastic MapReduce and Microsoft Project Isotop
This is how things are rolling these days. Microsoft talks about offerring Hadoop integration with Project Isotop in 2012, Amazon is announcing immediate availability of new beefed instances (Cluster Compute Eight Extra Large (cc2.8xlarge)) and reduced prices for some of the existing instances.
Original title and link: Hadoop: Amazon Elastic MapReduce and Microsoft Project Isotop (©myNoSQL)
Monday, 19 December 2011
How to Run a MapReduce Job Against Common Crawl Data Using Amazon Elastic MapReduce
Steve Salevan’s 7 step guide to setting up, compiling, deploying, and running a basic MapReduce job.
When Google unveiled its MapReduce algorithm to the world in an academic paper in 2004, it shook the very foundations of data analysis. By establishing a basic pattern for writing data analysis code that can run in parallel against huge datasets, speedy analysis of data at massive scale finally became a reality, turning many orthodox notions of data analysis on their head.
Google published the paper. Yahoo open sourced this. And Amazon is offering (unlimited) resources.
Update: The Hacker News thread where the main question answered is what other corporations are using MapReduce (besides the Internet companies). The answer is unfortunately extremely short: too many to be able to enumerate them all.
Original title and link: How to Run a MapReduce Job Against Common Crawl Data Using Amazon Elastic MapReduce (©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling