cassandra: All content tagged as cassandra in NoSQL databases and polyglot persistence
Thursday, 23 February 2012
A Tour of Amazon DynamoDB Features and API
Mathias Meyer’s walk through the DynamoDB features and API with commentary:
Sorted range keys, conditional updates, atomic counters, structured data and multi-valued data types, fetching and updating single attributes, strong consistency, and no explicit way to handle and resolve conflicts other than conditions. A lot of features DynamoDB has to offer remind me of everything that’s great about wide column stores like Cassandra, but even more so of HBase. This is great in my opinion, as Dynamo would probably not be well-suited for a customer-facing system. And indeed, Werner Vogel’s post on DynamoDB seems to suggest DynamoDB is a bastard child of Dynamo and SimpleDB, though with lots of sugar sprinkled on top.
Think of it as an extended, better articulated and closer to the API version of my notes about Amazon DynamoDB.
Original title and link: A Tour of Amazon DynamoDB Features and API (©myNoSQL)
via: http://www.paperplanes.de/2012/1/30/a-tour-of-amazons-dynamodb.html
Wednesday, 22 February 2012
Automating Cassandra Operations and Management With Netflix's Priam Tool
A new open source tool from Netflix, Priam—back in November, Netflix has released Curator, a ZooKeeper library—, used to simplify and automate the operations and management of a Cassandra cluster:
Priam is a co-process that runs alongside Cassandra on every node to provide the following functionality:
- Backup and recovery
- snapshot and incremental backups
- compression and multipart off-site uploading
- data recovery and data testing
Bootstrapping and automated token assignment
Priam automates the assignment of tokens to Cassandra nodes as they are added, removed or replaced in the ring. Priam relies on centralized external storage (SimpleDB/Cassandra) for storing token and membership information, which is used to bootstrap nodes into the cluster. It allows us to automate replacing nodes without any manual intervention, since we assume failure of nodes, and create failures using Chaos Monkey. The external Priam storage also provides us valuable information for the backup and recovery process.
Centralized configuration management: All our clusters are centrally configured via properties stored in SimpleDB, which includes setup of critical JVM settings and Cassandra YAML properties.
- RESTful monitoring and metrics: provides hooks that support external monitoring and automation scripts. They provide the ability to backup, restore a set of nodes manually and provide insights into Cassandra’s ring information. They also expose key Cassandra JMX commands such as repair and refresh.
Original title and link: Automating Cassandra Operations and Management With Netflix’s Priam Tool (©myNoSQL)
via: http://techblog.netflix.com/2012/02/announcing-priam.html
Monday, 20 February 2012
Dealing With JVM Limitations in Apache Cassandra
A couple of most notable NoSQL databases targeting large scalable systems are written in Java: Cassandra, HBase, BigCouch. Then there’s also Hadoop. Plus a series of caching and data grid solutions like Terracotta, Gigaspaces. They are all facing the same challenge: tuning the JVM garbage collector for predictable latency and throughput.
Jonathan Ellis’s slides presented at Fosdem 2012 are covering some of the problems with GC and the way Cassandra tackles them. While this is one of those presentations where the slides are not enough to understand the full picture, going through them will still give you a couple of good hints.
For those saying that Java and the JVM are not the platform for writing large concurrent systems, here’s the quote Ellis is finishing his slides with:
Cliff Click: Many concurrent algorithms are very easy to write with a GC and totally hard (to down right impossible) using explicit free.
Enjoy the slides after the break.
A Question About NoSQL Managed Hosting
It’s impossible to always have the right answers to all the questions. So this time I’ll have to ask you all: why only some NoSQL databases are present in managed hosting offers?
The first wave of NoSQL managed hosting services brought MongoDB, CouchDB, and some Redis. The second wave brought some more MongoDB, CouchDB, and just a bit more of Redis. It was only the third wave that brought some managed services for graph databases: Neo4j and OrientDB. Plus the first proposal for Cassandra managed hosting.
The first answer that comes to mind when thinking about NoSQL managed services is adoption. If a product is not in wide use then the chances for a company to run a profitable hosting business are very low. But I have the feeling that this is not the only or the complete answer.
Please chime in and share your thoughts.
Original title and link: A Question About NoSQL Managed Hosting (©myNoSQL)
Sunday, 19 February 2012
Cassandra at Clearspring with Chris Burroughs - Powered by NoSQL
For today’s Powered by Cassandra video from the Cassandra NYC 2011 event organized by DataStax, I chose Chris Burroughs’s presentation about Clearspring’s usage of Cassandra. Just in case you wonder what Clearspring is doing, the sharing buttons you see here on myNoSQL are powered by AddThis product from Clearspring.
Saturday, 18 February 2012
Cassandra 101 for System Administrators with Nathan Milford - Powered by NoSQL
While today was supposed to be a new educational video from the Cassandra NYC 2011 video series, I thought that learning from the lessons of operating Cassandra at Outbrain to serve over 30 billion impressions monthly will be quite educational.
Thursday, 16 February 2012
The Future of Big Data with Cassandra
One of the best presentations I’ve seen: concise, covering the topic from different angles, providing useful information, pitching a product and company in non-obtrusive ways.
The slidedeck by Matthew F. Dennis talks about realtime data and analytics from the perspective of Cassandra and DataStax. It starts by presenting the most important features of Cassandra:
- true multi DC support
- no SPOF
- linear scalability
- great read and write performance
- tunable consistency access
- durable
- integrated caching
and a series of use cases for Cassandra:
- time series
- sensor data
- messaging
- ad tracking
- financial market data
- user activity streams
- fraud detection
- risk analysis
It then summarizes three major Cassandra case studies with quotes emphasizing why Cassandra plays a critical role in each of them:
- Netflix
- Backupify
- ooyala
Enjoy it after the break.
Wednesday, 15 February 2012
Lessons in Data Visualization: How to create a visualization
Pete Warden:
Pick a question. Now that I had a rough idea for what I wanted to visualize, I really needed to focus on what I would be doing. The best way to do that is to chose the exact title you want to give your visualization.
Oftentimes, you might be tempted to start with an answer in the form of a hypothesis or preconception. The results will get might be valid but radically different.
As for the technologies used for data crunching, it’s Pig on Hadoop over a Cassandra cluster:
In my case, we have a Cassandra cluster with information on more than 350 million photos shared on Facebook. I’ve been running Pig analytics jobs regularly to get a view of what we have in there. […] In this case I already had some Pig scripts asking similar questions, so I was able to adapt one of those. The biggest surprise was when I ran into issues with some of the joins. The hard part was running the Hadoop job to gather the raw data from our Cassandra cluster, and that worked. I was able to output smaller files containing the gathered data, and then run a local Pig job to do the joins I needed.
Original title and link: Lessons in Data Visualization: How to create a visualization (©myNoSQL)
via: http://radar.oreilly.com/2012/02/how-to-create-visualization-facebook-vacation.html
Cassandra and MongoDB with Gigaspaces Cloudify
There are two reasons I’m writing about Gigaspaces’s Cloudify (PR announcement):
-
Besides MySQL, Cloudify recipes include Cassandra and MongoDB.
Also a bit of vintage claim chowder: if you remember Mike Gaultieri’s (Forrester) NoSQL wants to be elastic caching when it grows up, this should be a clear proof he was wrong.
-
Gigaspaces is starting to realize that it’s not really necessary to claim a NoSQL affiliation for benefitting of the NoSQL buzz. Clear market positioning and smartly showcasing it is much more useful for the potential customers. The other company showing it learned this lesson is Terracotta1.
-
I’m probably biased on this as I was responsible for talking to Terracotta folks about this better route. ↩
Original title and link: Cassandra and MongoDB with Gigaspaces Cloudify (©myNoSQL)
Hosted and Managed NoSQL: Cassandra, Redis, OrientDB
In the last few days I’ve read about some new NoSQL hosting solutions:
-
Cassandra: managed hardware & software hosting:
Per node:
- Intel Dual Quad-core (8 cpu’s), 16gb of memory, 2tb primary storage + 500gb commitlog drive
- 5 public ip addresses, 1000Mbps private network port.
- Debian, CentOS, RedHat or FreeBSD
- Cassandra setup, configuration and ongoing maintenance (repairs, cleanups, troubleshooting)
- Cassandra upgrades (rolling restart)
- 24x7 real-time monitoring (load, tcp, jmx and cassandra logs)
- Multi-datacenter environment (we’ll spread your cluster across two or three geographic locations, based on your needs)
- 30 days test drive
Cost: $850/monthly per node (5tb bandwidth, includes backups & monitoring)
-
OrientDB: NuvolaBase
- Real-time replicated deployment
- Managed
- JSON over HTTP access
- can offer VPN connections to the cluster
-
Redis: Cloudnode
- Cloudeno.de is still in beta
- “one Redis instance free with every Cloudnode account”, but no further details about the characteristicts of the instance
Hosting for NoSQL databases has been available in some form or another for a while, but only for the most popular ones (MongoDB, CouchDB, Redis). Things are changing fast. Neo4j is advertising heavily the Heroku add-on, OrientDB got NuvolaBase, and so on.
This is the market that Amazon is targeting with Amazon RDS, SimpleDB, and DynamoDB: the managed data services and that as part of a bigger strategy. What should be clear is that Amazon is not after NoSQL database companies.
Anyone considering a business in the managed data services market should realize that Amazon will not get into supporting all the NoSQL databases out there. They’d also better take a deep look and learn from what Amazon is offering with SimpleDB and DynamoDB.
Original title and link: Hosted and Managed NoSQL: Cassandra, Redis, OrientDB (©myNoSQL)
Sunday, 12 February 2012
Scaling Video Analytics with Cassandra by Ilya Maykov - Powered by NoSQL
To keep with last week’s model—an educational video about Cassandra, followed by a Cassandra case study—today’s video in the Cassandra NYC 2011 video series from DataStax, is Ilya Maykov describe how Cassandra is used at Ooyala for computing multi-dimensional video analytics reports for 100M+ monthly unique users in near-real-time.
Saturday, 11 February 2012
Cassandra Data Modeling Examples with Matthew F. Dennis - NoSQL videos
Continuing the Cassandra NYC 2011 video series, made available by the folks from DataStax, this week we have Matthew F. Dennis which covers a couple of different Cassandra data modeling use cases.
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling