cassandra: All content tagged as cassandra in NoSQL databases and polyglot persistence
A couple of most notable NoSQL databases targeting large scalable systems are written in Java: Cassandra, HBase, BigCouch. Then there’s also Hadoop. Plus a series of caching and data grid solutions like Terracotta, Gigaspaces. They are all facing the same challenge: tuning the JVM garbage collector for predictable latency and throughput.
Jonathan Ellis’s slides presented at Fosdem 2012 are covering some of the problems with GC and the way Cassandra tackles them. While this is one of those presentations where the slides are not enough to understand the full picture, going through them will still give you a couple of good hints.
For those saying that Java and the JVM are not the platform for writing large concurrent systems, here’s the quote Ellis is finishing his slides with:
Cliff Click: Many concurrent algorithms are very easy to write with a GC and totally hard (to down right impossible) using explicit free.
Enjoy the slides after the break.
It’s impossible to always have the right answers to all the questions. So this time I’ll have to ask you all: why only some NoSQL databases are present in managed hosting offers?
The first wave of NoSQL managed hosting services brought MongoDB, CouchDB, and some Redis. The second wave brought some more MongoDB, CouchDB, and just a bit more of Redis. It was only the third wave that brought some managed services for graph databases: Neo4j and OrientDB. Plus the first proposal for Cassandra managed hosting.
The first answer that comes to mind when thinking about NoSQL managed services is adoption. If a product is not in wide use then the chances for a company to run a profitable hosting business are very low. But I have the feeling that this is not the only or the complete answer.
Please chime in and share your thoughts.
Original title and link: A Question About NoSQL Managed Hosting ( ©myNoSQL)
For today’s Powered by Cassandra video from the Cassandra NYC 2011 event organized by DataStax, I chose Chris Burroughs’s presentation about Clearspring’s usage of Cassandra. Just in case you wonder what Clearspring is doing, the sharing buttons you see here on myNoSQL are powered by AddThis product from Clearspring.
While today was supposed to be a new educational video from the Cassandra NYC 2011 video series, I thought that learning from the lessons of operating Cassandra at Outbrain to serve over 30 billion impressions monthly will be quite educational.
One of the best presentations I’ve seen: concise, covering the topic from different angles, providing useful information, pitching a product and company in non-obtrusive ways.
The slidedeck by Matthew F. Dennis talks about realtime data and analytics from the perspective of Cassandra and DataStax. It starts by presenting the most important features of Cassandra:
- true multi DC support
- no SPOF
- linear scalability
- great read and write performance
- tunable consistency access
- integrated caching
and a series of use cases for Cassandra:
- time series
- sensor data
- ad tracking
- financial market data
- user activity streams
- fraud detection
- risk analysis
It then summarizes three major Cassandra case studies with quotes emphasizing why Cassandra plays a critical role in each of them:
Enjoy it after the break.
Besides MySQL, Cloudify recipes include Cassandra and MongoDB.
Also a bit of vintage claim chowder: if you remember Mike Gaultieri’s (Forrester) NoSQL wants to be elastic caching when it grows up, this should be a clear proof he was wrong.
Gigaspaces is starting to realize that it’s not really necessary to claim a NoSQL affiliation for benefitting of the NoSQL buzz. Clear market positioning and smartly showcasing it is much more useful for the potential customers. The other company showing it learned this lesson is Terracotta1.
I’m probably biased on this as I was responsible for talking to Terracotta folks about this better route. ↩
Original title and link: Cassandra and MongoDB with Gigaspaces Cloudify ( ©myNoSQL)
In the last few days I’ve read about some new NoSQL hosting solutions:
Cassandra: managed hardware & software hosting:
- Intel Dual Quad-core (8 cpu’s), 16gb of memory, 2tb primary storage + 500gb commitlog drive
- 5 public ip addresses, 1000Mbps private network port.
- Debian, CentOS, RedHat or FreeBSD
- Cassandra setup, configuration and ongoing maintenance (repairs, cleanups, troubleshooting)
- Cassandra upgrades (rolling restart)
- 24x7 real-time monitoring (load, tcp, jmx and cassandra logs)
- Multi-datacenter environment (we’ll spread your cluster across two or three geographic locations, based on your needs)
- 30 days test drive
Cost: $850/monthly per node (5tb bandwidth, includes backups & monitoring)
- Real-time replicated deployment
- JSON over HTTP access
- can offer VPN connections to the cluster
- Cloudeno.de is still in beta
- “one Redis instance free with every Cloudnode account”, but no further details about the characteristicts of the instance
Hosting for NoSQL databases has been available in some form or another for a while, but only for the most popular ones (MongoDB, CouchDB, Redis). Things are changing fast. Neo4j is advertising heavily the Heroku add-on, OrientDB got NuvolaBase, and so on.
This is the market that Amazon is targeting with Amazon RDS, SimpleDB, and DynamoDB: the managed data services and that as part of a bigger strategy. What should be clear is that Amazon is not after NoSQL database companies.
Anyone considering a business in the managed data services market should realize that Amazon will not get into supporting all the NoSQL databases out there. They’d also better take a deep look and learn from what Amazon is offering with SimpleDB and DynamoDB.
Original title and link: Hosted and Managed NoSQL: Cassandra, Redis, OrientDB ( ©myNoSQL)
To keep with last week’s model—an educational video about Cassandra, followed by a Cassandra case study—today’s video in the Cassandra NYC 2011 video series from DataStax, is Ilya Maykov describe how Cassandra is used at Ooyala for computing multi-dimensional video analytics reports for 100M+ monthly unique users in near-real-time.
Continuing the Cassandra NYC 2011 video series, made available by the folks from DataStax, this week we have Matthew F. Dennis which covers a couple of different Cassandra data modeling use cases.