Cassandra: All content tagged as Cassandra in NoSQL databases and polyglot persistence
Wednesday, 29 May 2013
Best argument for official drivers
Jonathan Ellis:
More qualitatively but perhaps even more important, this addresses the paradox of choice we’ve had in the Cassandra Java world: multiple driver choices provide another barrier to newcomers, where each must evaluate the options for applicability to his project. Having just done such an evaluation to settle on Cassandra itself, this is the last thing they want to spend time on.
And that’s the best-case scenario. More often, a fragmented landscape leads to many solutions, each of which solve a different 80% of the problem. Better to have a single, well-thought-out solution, that lets people get started writing their application immediately.
This is the best argument ever for having official drivers.
✚ In the early days and over long time it’s quite difficult for a company to offer only official drivers. But there’s a solution for that too: recommend one. And support its maintainers.
Original title and link: Best argument for official drivers (©myNoSQL)
via: http://www.datastax.com/dev/blog/the-native-cql-java-driver-goes-ga
Titan: Data Loading and Transactional Benchmark
The Aurelius team describing an advanced benchmark of Titan, a massive scale property graph allowing real-time traversals and updates, sponsored by Pearson, developed and run over 5 months:
The 10 terabyte, 121 billion edge graph was loaded into the cluster in 1.48 days at a rate of approximately 1.2 million edges a second with 0 failed transactions. These numbers were possible due to new developments in Titan 0.3.0 whereby graph partitioning is achieved using a domain-basedbyte order partitioner.
✚ The answer to why Titan is built on Cassandra can be found in this interview between Aurelius CTO Matthias Broecheler and DataStax co-founder Matt Pfeil:
[…] we don’t have to worry about things like replication, backup, and snap shots because all of that stuff is handled by Cassandra. We really just focus on: “How do you distribute a graph?”, “How do you represent a graph efficiently in a big table model?”, “How do you do things like etched compression and other things that are very graph specific in order to make the database fast? And, lastly, “How do to build intelligence index structures so that the graphs traversals, which are the core of any graph database, so that those are as fast as possible?”
Original title and link: Titan: Data Loading and Transactional Benchmark (©myNoSQL)
via: http://www.planetcassandra.org/blog/post/educating-the-planet-with-pearson
Thursday, 23 May 2013
Cassandra anti-patterns: Queues and queue-like datasets or when Deletes can bite
Aleksey Yeschenko has an interesting post about the impact deletes can have on Cassandra and different workaround solutions:
Specifically, tombstones will bite you if you do lots of deletes (especially column-level deletes) and later perform slice queries on rows with a lot of tombstones.
I wouldn’t call this a “you got your data model wrong”, but rather a known implementation limitation that has impact on some scenarios in which a different data model should be used; the difference, while only semantic, is that the error is not on the user.
In other words, if you use column-level deletes (or expiring columns) heavily and also need to perform slice queries over that data, try grouping columns with close “expiration date” together and getting rid of them in a single move.
Original title and link: Cassandra anti-patterns: Queues and queue-like datasets or when Deletes can bite (©myNoSQL)
via: http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
Thursday, 4 April 2013
Kairosdb - Fast Scalable Time Series Database
kairosdb is introduced as a rewrite of the OpenTSDB written primarily for Cassandra (nb: OpenTSDB was based on HBase). In terms of what it brings new, this page lists:
- Uses Guice to load modules.
- Incorporates Jetty for Rest API and serving up UI.
- Pure Java build tool (Tablesaw)
- UI uses Flot and is client side rendered.
- Ability to customize UI.
- Relative time now includes month and supports leap years.
- Modular data store interface supports:
- HBase
- Cassandra
- H2 (For development)
- Milliseconds data support when using Cassandra.
- Rest API for querying and submitting data.
- Build produces deployable tar, rpm and deb packages.
- Linux start/stop service scripts.
- Faster.
- Made aggregations optional (easier to get raw data).
- Added abilities to import and export data.
- Aggregators can aggregate data for a specified period.
- Aggregators can be stacked or “piped” together.
Source code lives on GitHub. Let’s see where it goes.
Original title and link: Kairosdb - Fast Scalable Time Series Database (©myNoSQL)
Wednesday, 3 April 2013
5 Steps to Benchmarking Managed NoSQL - DynamoDB Vs Cassandra
Ben Bromhead (instaclustr) for High Scalability:
To determine the suitability of a provider, your first port of call is to benchmark. Choosing a service provider is often done in a number of stages. First is to shortlist providers based on capabilities and claimed performance, ruling out those that do not meet your application requirements. Second is to look for benchmarks conducted by third parties, if any. The final stage is to benchmark the service yourself.
✚ Peter Bailis asks a very valid question: if it’s the default YCSB and it’s a benchmark, where are the results?”
✚ instaclustr offers a totally managed hosting solution for Cassandra. (Disclaimer: they’ve sponsored myNoSQL in the past)
Original title and link: 5 Steps to Benchmarking Managed NoSQL - DynamoDB Vs Cassandra (©myNoSQL)
Tuesday, 2 April 2013
Improving Secondary Index Write Performance in Cassandra 1.2
Sam Tunnicliffe’s describes the old and new, optimized behavior of secondary indexes writes in Cassandra 1.2:
While secondary indexes can add a lot of flexibility to the way data is modelled and accessed, they do add complexity on the server side as the indexes need to be kept in sync with the primary data. Until recently, this has led to some significant trade offs in write throughput and IO utilisation as we always had to perform a read before the write in order to update any relevant secondary indexes. In Cassandra 1.2, this area has been substantially reworked to remove the need for read-before-write. New index entries are now written at the same time as the primary data is updated and old entries removed lazily at query time. Overall, this has lead to some decent performance improvements.
Original title and link: Improving Secondary Index Write Performance in Cassandra 1.2 (©myNoSQL)
via: http://www.datastax.com/dev/blog/improving-secondary-index-write-performance-in-1-2
Thursday, 28 March 2013
Graph Based Recommendation Systems at eBay
Slidedeck from eBay explaining how they have implemented a graph based recommendation system based on,—surprise! not a graph database—Cassandra.
Original title and link: Graph Based Recommendation Systems at eBay (©myNoSQL)
Wednesday, 13 March 2013
RSS Reader With Cassandra and Netflix OSS Tools
This RSS reader app from Netflix can be a very good excuse to use Cassandra, some of the open source projects from Netflix and why not create an alternative to Google’s Reader which is declared defunct or alive every couple of months:
Projects you’ll use: Cassandra with Astyanax, Archaius, Blitz4j, Eurka, Governator, Hystrix, Karyon, Ribbon, Servo. As for myself, I’ve already checked out the code.
Original title and link: RSS Reader With Cassandra and Netflix OSS Tools (©myNoSQL)
via: http://techblog.netflix.com/2013/03/introducing-first-netflixoss-recipe-rss.html
Tuesday, 12 March 2013
Cassandra at Adobe: The Profile Cache Servers
The team I know at Adobe has invested a lot into HBase and they are offering their services globally. But according to this PDF, in a true polyglot database manner, it looks like other parts of the Adobe business have opted for a different solution: Cassandra. The size of the cluster mentioned in the whitepaper is pretty small, 16 nodes, but what is interesting is that these are beafy servers using solid state drives:
The PCS is comprised of large servers using solid state drives (SSDs) for storage […] The PCS is basically Cassandra with a set of custom APIs built on top of it.
Original title and link: Cassandra at Adobe: The Profile Cache Servers (©myNoSQL)
Friday, 8 March 2013
Adding Value Through Graph Analysis Using Titan and Faunus
Interesting slidedeck by Matthias Broecheler introducing 3 graph-related tools developed by Vadas Gintautas, Marko Rodriguez, Stephen Mallette and Daniel LaRocque:
- Titan: a massive scale property graph allowing real-time traversals and updates
- Faunus: for batch processing of large graphs using Hadoop
- Fulgora: for global running graph algorithms on large, compressed, in-memory graphs
The first couple of slides are also showing some possible use cases where these tools would prove their usefulness:
Original title and link: Adding Value Through Graph Analysis Using Titan and Faunus (©myNoSQL)
Wednesday, 6 March 2013
Brief Intro to Cassandra in 27 Slides
If you never looked into Apache Cassandra, Michaël Figuière’s slidedeck will give you a quick into Cassandra’s main concepts.
Apache Cassandra 1.2 introduces some new features such as a Binary Protocol and Collections datatype that together with the now finalized CQL3 query language provide a new interface to communicate with Cassandra that dramatically shrink its learning curve and simplify its daily use while still relying on its highly scalable architecture and storage engine. This presentation will iterate over all these new features including an overview of CQL3 query language, a look at the new client architecture, and an update on data modeling best practices. Then we’ll see how to implement an enterprise application using this new interface so that the audience can realize that a number of design principles are inspired from those commonly used with relational databases while some other entirely different, due to Cassandra partitioning approach.
Original title and link: Brief Intro to Cassandra in 27 Slides (©myNoSQL)
Tuesday, 5 March 2013
A Quick Tour of Internal Authentication and Authorization Security in DataStax Enterprise and Apache Cassandra
Robin Schumacher describes the new security features added to Apache Cassandra and DataStax Enterprise:
This article will concentrate on the new internal authentication and authorization (or permission management) features that are part of both open source Cassandra as well as DataStax Enterprise. Authentication deals with validating incoming user connections to a database cluster, whereas authorization concerns itself with what a logged in user can do inside a database.
I’m happy to see NoSQL databases entering the space of security as this would ease their way inside enterprises. But I fear a bit the moment when the marketing message will change from “it’s too early to provide security features” to “the first enterprise grade NoSQL database”.
Original title and link: A Quick Tour of Internal Authentication and Authorization Security in DataStax Enterprise and Apache Cassandra (©myNoSQL)
Most Popular Articles
- Translate SQL to MongoDB MapReduce
- Tutorial: Getting Started With Cassandra
- CouchDB vs MongoDB: An attempt for a More Informed Comparison
- Cassandra @ Twitter: An Interview with Ryan King
- A Couple of Nice GUI Tools for MongoDB
- NoSQL benchmarks and performance evaluations
- Ehcache: Distributed Cache or NoSQL Store?
- Document Databases Compared: CouchDB, MongoDB, RavenDB
- Quick Review of Existing Graph Databases
- NoSQL Data Modeling
