NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter



Considering TokuDB as an engine for timeseries data... or Cassandra or OpenTSDB

Vadim Tkachenko:

  • Provide high insertion rate
  • Provide a good compression rate to store more data on expensive SSDs
  • Engine should be SSD friendly (less writes per timeperiod to help with SSD wear)
  • Provide a reasonable response time (within ~50 ms) on SELECT queries on hot recently inserted data

Looking on these requirements I actually think that TokuDB might be a good fit for this task.

There are solutions in the NoSQL space that are optimized for this scenario: Cassandra or OpenTSDB. Indeed using one of these will have an impact on the application side.

Most of the time when the requirements dictate looking into different solutions, the easiest to estimate is the initial costs: development (nb: this doesn’t include only pure development, but also learning costs, etc.) and hardware costs.

Unfortunately many times we ignore taking into consideration long term costs:

  • maintenance costs (hardware, operations, enhancements)
  • opportunity costs (features that the current architecture won’t be able to support as being either impossible or too expensive)
  • accounting for the risks of failed initial designs (the technical debt costs)

Way too many times we optimize for the initial costs (the general excuse is that familiarity delivers faster—with the more scientific forms: time to market is essential and premature optimization is the root of all evil), while ignoring almost completely the ongoing costs.

Original title and link: Considering TokuDB as an engine for timeseries data… or Cassandra or OpenTSDB (NoSQL database©myNoSQL)