ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

dynamo: All content tagged as dynamo in NoSQL databases and polyglot persistence

Amazon Web Services Annual Revenue Estimation

Over the weekend, Christopher Mims has published an article in which he derives a figure for Amazon Web Services’s annual revenue: $2.4 billions:

Amazon is famously reticent about sales figures, dribbling out clues without revealing actual numbers. But it appears the company has left enough hints to, finally, discern how much revenue it makes on its cloud computing business, known as Amazon Web Services, which provides the backbone for a growing portion of the internet: about $2.4 billion a year.

There’s no way to decompose this number into the revenue of each AWS solution. For the data space I’d be interested into:

  1. S3 revenues. This is the space Basho’s Riak CS competes into.

    After writing my first post about Riak CS, I’ve learned that in Japan, the same place where Riak CS is run by Yahoo! new cloud storage, Gemini Mobile Technologies has been offering to local ISPs a similar S3-service built on top of Cassandra.

  2. Redshift is pretty new and while I’m not aware of immediate competitors (what am I missing?), I don’t think it accounts for a significant part of this revenue. Even if some of the early users, like AirBnb, report getting very good performance and costs from it.

    Redshift is powered by ParAccell, which, over the weekend, has been acquired by Actian.

  3. Amazon Elastic MapReduce. This is another interesting space from which Microsoft wants a share with its Azure HDInsight developed in collaboration with Hortonworks.

    In this space there’s also MapR and Google Compute combination which seem to be extremely performant.

  4. Interestingly Amazon is making money also from some of the competitors of its Amazon Dynamo and RDS services. The advantage of owning the infrastructure.

Original title and link: Amazon Web Services Annual Revenue Estimation (NoSQL database©myNoSQL)


Kairosdb - Fast Scalable Time Series Database

kairosdb is introduced as a rewrite of the OpenTSDB written primarily for Cassandra (nb: OpenTSDB was based on HBase). In terms of what it brings new, this page lists:

  • Uses Guice to load modules.
  • Incorporates Jetty for Rest API and serving up UI.
  • Pure Java build tool (Tablesaw)
  • UI uses Flot and is client side rendered.
  • Ability to customize UI.
  • Relative time now includes month and supports leap years.
  • Modular data store interface supports:
    • HBase
    • Cassandra
    • H2 (For development)
  • Milliseconds data support when using Cassandra.
  • Rest API for querying and submitting data.
  • Build produces deployable tar, rpm and deb packages.
  • Linux start/stop service scripts.
  • Faster.
  • Made aggregations optional (easier to get raw data).
  • Added abilities to import and export data.
  • Aggregators can aggregate data for a specified period.
  • Aggregators can be stacked or “piped” together.

Source code lives on GitHub. Let’s see where it goes.

Original title and link: Kairosdb - Fast Scalable Time Series Database (NoSQL database©myNoSQL)


5 Steps to Benchmarking Managed NoSQL - DynamoDB Vs Cassandra

Ben Bromhead (instaclustr) for High Scalability:

To determine the suitability of a provider, your first port of call is to benchmark. Choosing a service provider is often done in a number of stages. First is to shortlist providers based on capabilities and claimed performance, ruling out those that do not meet your application requirements. Second is to look for benchmarks conducted by third parties, if any. The final stage is to benchmark the service yourself.

Peter Bailis asks a very valid question: if it’s the default YCSB and it’s a benchmark, where are the results?”

✚ instaclustr offers a totally managed hosting solution for Cassandra. (Disclaimer: they’ve sponsored myNoSQL in the past)

Original title and link: 5 Steps to Benchmarking Managed NoSQL - DynamoDB Vs Cassandra (NoSQL database©myNoSQL)

via: http://highscalability.com/blog/2013/4/3/5-steps-to-benchmarking-managed-nosql-dynamodb-vs-cassandra.html


Improving Secondary Index Write Performance in Cassandra 1.2

Sam Tunnicliffe’s describes the old and new, optimized behavior of secondary indexes writes in Cassandra 1.2:

While secondary indexes can add a lot of flexibility to the way data is modelled and accessed, they do add complexity on the server side as the indexes need to be kept in sync with the primary data. Until recently, this has led to some significant trade offs in write throughput and IO utilisation as we always had to perform a read before the write in order to update any relevant secondary indexes. In Cassandra 1.2, this area has been substantially reworked to remove the need for read-before-write. New index entries are now written at the same time as the primary data is updated and old entries removed lazily at query time. Overall, this has lead to some decent performance improvements.

Original title and link: Improving Secondary Index Write Performance in Cassandra 1.2 (NoSQL database©myNoSQL)

via: http://www.datastax.com/dev/blog/improving-secondary-index-write-performance-in-1-2


Graph Based Recommendation Systems at eBay

Slidedeck from eBay explaining how they have implemented a graph based recommendation system based on,—surprise! not a graph database—Cassandra.

Original title and link: Graph Based Recommendation Systems at eBay (NoSQL database©myNoSQL)


RSS Reader With Cassandra and Netflix OSS Tools

This RSS reader app from Netflix can be a very good excuse to use Cassandra, some of the open source projects from Netflix and why not create an alternative to Google’s Reader which is declared defunct or alive every couple of months:

Recipes_RSS_arch

Projects you’ll use: Cassandra with Astyanax, Archaius, Blitz4j, Eurka, Governator, Hystrix, Karyon, Ribbon, Servo. As for myself, I’ve already checked out the code.

Original title and link: RSS Reader With Cassandra and Netflix OSS Tools (NoSQL database©myNoSQL)

via: http://techblog.netflix.com/2013/03/introducing-first-netflixoss-recipe-rss.html


Cassandra at Adobe: The Profile Cache Servers

The team I know at Adobe has invested a lot into HBase and they are offering their services globally. But according to this PDF, in a true polyglot database manner, it looks like other parts of the Adobe business have opted for a different solution: Cassandra. The size of the cluster mentioned in the whitepaper is pretty small, 16 nodes, but what is interesting is that these are beafy servers using solid state drives:

The PCS is comprised of large servers using solid state drives (SSDs) for storage […] The PCS is basically Cassandra with a set of custom APIs built on top of it.

Original title and link: Cassandra at Adobe: The Profile Cache Servers (NoSQL database©myNoSQL)


Adding Value Through Graph Analysis Using Titan and Faunus

Interesting slidedeck by Matthias Broecheler introducing 3 graph-related tools developed by Vadas Gintautas, Marko Rodriguez, Stephen Mallette and Daniel LaRocque:

  1. Titan: a massive scale property graph allowing real-time traversals and updates
  2. Faunus: for batch processing of large graphs using Hadoop
  3. Fulgora: for global running graph algorithms on large, compressed, in-memory graphs

The first couple of slides are also showing some possible use cases where these tools would prove their usefulness:

Original title and link: Adding Value Through Graph Analysis Using Titan and Faunus (NoSQL database©myNoSQL)


Brief Intro to Cassandra in 27 Slides

If you never looked into Apache Cassandra, Michaël Figuière’s slidedeck will give you a quick into Cassandra’s main concepts.

Apache Cassandra 1.2 introduces some new features such as a Binary Protocol and Collections datatype that together with the now finalized CQL3 query language provide a new interface to communicate with Cassandra that dramatically shrink its learning curve and simplify its daily use while still relying on its highly scalable architecture and storage engine. This presentation will iterate over all these new features including an overview of CQL3 query language, a look at the new client architecture, and an update on data modeling best practices. Then we’ll see how to implement an enterprise application using this new interface so that the audience can realize that a number of design principles are inspired from those commonly used with relational databases while some other entirely different, due to Cassandra partitioning approach.

Original title and link: Brief Intro to Cassandra in 27 Slides (NoSQL database©myNoSQL)


A Quick Tour of Internal Authentication and Authorization Security in DataStax Enterprise and Apache Cassandra

Robin Schumacher describes the new security features added to Apache Cassandra and DataStax Enterprise:

This article will concentrate on the new internal authentication and authorization (or permission management) features that are part of both open source Cassandra as well as DataStax Enterprise. Authentication deals with validating incoming user connections to a database cluster, whereas authorization concerns itself with what a logged in user can do inside a database.

I’m happy to see NoSQL databases entering the space of security as this would ease their way inside enterprises. But I fear a bit the moment when the marketing message will change from “it’s too early to provide security features” to “the first enterprise grade NoSQL database”.

Original title and link: A Quick Tour of Internal Authentication and Authorization Security in DataStax Enterprise and Apache Cassandra (NoSQL database©myNoSQL)

via: http://www.planetcassandra.org/blog/post/a-quick-tour-of-internal-authentication-and-authorization-security-in-datastax-enterprise-and-apache-cassandra


From SimpleDB to Cassandra: Data Migration for a High Volume Web Application at Netflix

Prasanna Padmanabhan and Shashi Madapp posted an article on the Netflix blog describing the process used to migrate data from Amazon SimpleDB to Cassandra:

There will come a time in the life of most systems serving data, when there is a need to migrate data to a more reliable, scalable and high performance data store while maintaining or improving data consistency, latency and efficiency. This document explains the data migration technique we used at Netflix to migrate the user’s queue data between two different distributed NoSQL storage systems.

The steps involved are what you’d expect for a large data set migration:

  1. forklift
  2. incremental replication
  3. consistency checking
  4. shadow writes
  5. shadow writes and shadow reads for validation
  6. end of life of the original data store (SimpleDB)

If you think of it, this is how a distributed, eventually consistent storage works (at least in big lines) when replicating data across the cluster. The main difference is that inside a storage engine you deal with a homogeneous system with a single set of constraints, while data migration has to deal with heterogenous systems most often characterized by different limitations and behavior.

In 2009, Netflix performed a similar massive data migration operation. At that time it involved moving data from its own hosted Oracle and MySQL databases to SimpleDB. The challenges of operating this hybrid solution were described in a the paper Netflix’s Transition to High-Availability Storage Systems authored by Sid Anand.

Sid Anand is now working at LinkedIn where they use Databus for low latency data transfer. But Databus’s approach is very similar.

Original title and link: From SimpleDB to Cassandra: Data Migration for a High Volume Web Application at Netflix (NoSQL database©myNoSQL)

via: http://techblog.netflix.com/2013/02/netflix-queue-data-migration-for-high.html?m=1


DataStax's Reaction to MySQL 5.6: Oracle’s MySQL Misses the NoSQL Mark

Jonathan Ellis in a post about MySQL 5.6 and how Oracle got the whole NoSQL wrong, considering NoSQL is, in this exact order, about scaling, continuous availability, flexibility, performance, and queryability:

The big news for MySQL 5.6 was the inclusion of “NoSQL” features in the form of a memcached api for get and put operations.

In cases like this, it’s tough to tell whether Oracle got this so wrong deliberately to sow confusion in the market, or because they really think that’s what NoSQL is about.

I know Jonathan Ellis has always had very strong opinions about the technical superiority of Cassandra and Cassandra is indeed a very solid solution, but I’m always reluctant to calling a competitor stupid and using the myopic argument “if I’m good at X and suck at Y, then what everyone is looking for is only X”.

Original title and link: DataStax’s Reaction to MySQL 5.6: Oracle’s MySQL Misses the NoSQL Mark (NoSQL database©myNoSQL)

via: http://www.datastax.com/dev/blog/oracles-mysql-misses-the-nosql-mark