A great article on how to figure out the right hardware for your Cassandra cluster:
As usual, spec’ing out hardware for a given application is a matter of balancing five variables:
CPU capacity (taking into account the single/multi threaded aspects of the application)
RAM capacity (how much working space does the application need and how much cache is optimal)
Disk capacity (actual disk storage space)
Disk i/o performance (the number of read and write requests per second that can be handled)
* Network capacity (how much bandwidth is needed)
If you run into a bottleneck on any of these five items, any additional capacity that is available within the other four categories is wasted.
Eric Rosenberry goes on an details this iterative process for their particular case:
Before I dive into my findings, I should point out that this is not one size fits all solution as it greatly depends on what your dataset looks like and what your read/write patterns are. Our dataset happens to be billions of exceedingly small records. This means we do an incredible amount of random read i/o. Your milage may vary depending on what you do with it.
I must confess I’m really curious how many startups are really able to run this sort of extensive experiments and come up with a “very educated” decision.
Original title and link: A Cassandra Hardware Stack (NoSQL databases © myNoSQL)