dynamo: All content tagged as dynamo in NoSQL databases and polyglot persistence
Over the weekend, Christopher Mims has published an article in which he derives a figure for Amazon Web Services’s annual revenue: $2.4 billions:
Amazon is famously reticent about sales figures, dribbling out clues without revealing actual numbers. But it appears the company has left enough hints to, finally, discern how much revenue it makes on its cloud computing business, known as Amazon Web Services, which provides the backbone for a growing portion of the internet: about $2.4 billion a year.
There’s no way to decompose this number into the revenue of each AWS solution. For the data space I’d be interested into:
S3 revenues. This is the space Basho’s Riak CS competes into.
After writing my first post about Riak CS, I’ve learned that in Japan, the same place where Riak CS is run by Yahoo! new cloud storage, Gemini Mobile Technologies has been offering to local ISPs a similar S3-service built on top of Cassandra.
Redshift is pretty new and while I’m not aware of immediate competitors (what am I missing?), I don’t think it accounts for a significant part of this revenue. Even if some of the early users, like AirBnb, report getting very good performance and costs from it.
Redshift is powered by ParAccell, which, over the weekend, has been acquired by Actian.
Amazon Elastic MapReduce. This is another interesting space from which Microsoft wants a share with its Azure HDInsight developed in collaboration with Hortonworks.
Interestingly Amazon is making money also from some of the competitors of its Amazon Dynamo and RDS services. The advantage of owning the infrastructure.
Original title and link: Amazon Web Services Annual Revenue Estimation ( ©myNoSQL)
- Uses Guice to load modules.
- Incorporates Jetty for Rest API and serving up UI.
- Pure Java build tool (Tablesaw)
- UI uses Flot and is client side rendered.
- Ability to customize UI.
- Relative time now includes month and supports leap years.
- Modular data store interface supports:
- H2 (For development)
- Milliseconds data support when using Cassandra.
- Rest API for querying and submitting data.
- Build produces deployable tar, rpm and deb packages.
- Linux start/stop service scripts.
- Made aggregations optional (easier to get raw data).
- Added abilities to import and export data.
- Aggregators can aggregate data for a specified period.
- Aggregators can be stacked or “piped” together.
Source code lives on GitHub. Let’s see where it goes.
Original title and link: Kairosdb - Fast Scalable Time Series Database ( ©myNoSQL)
Slidedeck from eBay explaining how they have implemented a graph based recommendation system based on,—surprise! not a graph database—Cassandra.
Original title and link: Graph Based Recommendation Systems at eBay ( ©myNoSQL)
The team I know at Adobe has invested a lot into HBase and they are offering their services globally. But according to this PDF, in a true polyglot database manner, it looks like other parts of the Adobe business have opted for a different solution: Cassandra. The size of the cluster mentioned in the whitepaper is pretty small, 16 nodes, but what is interesting is that these are beafy servers using solid state drives:
The PCS is comprised of large servers using solid state drives (SSDs) for storage […] The PCS is basically Cassandra with a set of custom APIs built on top of it.
Original title and link: Cassandra at Adobe: The Profile Cache Servers ( ©myNoSQL)
Interesting slidedeck by Matthias Broecheler introducing 3 graph-related tools developed by Vadas Gintautas, Marko Rodriguez, Stephen Mallette and Daniel LaRocque:
- Titan: a massive scale property graph allowing real-time traversals and updates
- Faunus: for batch processing of large graphs using Hadoop
- Fulgora: for global running graph algorithms on large, compressed, in-memory graphs
The first couple of slides are also showing some possible use cases where these tools would prove their usefulness:
Original title and link: Adding Value Through Graph Analysis Using Titan and Faunus ( ©myNoSQL)
If you never looked into Apache Cassandra, Michaël Figuière’s slidedeck will give you a quick into Cassandra’s main concepts.
Apache Cassandra 1.2 introduces some new features such as a Binary Protocol and Collections datatype that together with the now finalized CQL3 query language provide a new interface to communicate with Cassandra that dramatically shrink its learning curve and simplify its daily use while still relying on its highly scalable architecture and storage engine. This presentation will iterate over all these new features including an overview of CQL3 query language, a look at the new client architecture, and an update on data modeling best practices. Then we’ll see how to implement an enterprise application using this new interface so that the audience can realize that a number of design principles are inspired from those commonly used with relational databases while some other entirely different, due to Cassandra partitioning approach.
Original title and link: Brief Intro to Cassandra in 27 Slides ( ©myNoSQL)