ALL COVERED TOPICS

NoSQL Benchmarks NoSQL use cases NoSQL Videos NoSQL Hybrid Solutions NoSQL Presentations Big Data Hadoop MapReduce Pig Hive Flume Oozie Sqoop HDFS ZooKeeper Cascading Cascalog BigTable Cassandra HBase Hypertable Couchbase CouchDB MongoDB OrientDB RavenDB Jackrabbit Terrastore Amazon DynamoDB Redis Riak Project Voldemort Tokyo Cabinet Kyoto Cabinet memcached Amazon SimpleDB Datomic MemcacheDB M/DB GT.M Amazon Dynamo Dynomite Mnesia Yahoo! PNUTS/Sherpa Neo4j InfoGrid Sones GraphDB InfiniteGraph AllegroGraph MarkLogic Clustrix CouchDB Case Studies MongoDB Case Studies NoSQL at Adobe NoSQL at Facebook NoSQL at Twitter

NAVIGATE MAIN CATEGORIES

Close

OpenTSDB: All content tagged as OpenTSDB in NoSQL databases and polyglot persistence

Considering TokuDB as an engine for timeseries data... or Cassandra or OpenTSDB

Vadim Tkachenko:

  • Provide high insertion rate
  • Provide a good compression rate to store more data on expensive SSDs
  • Engine should be SSD friendly (less writes per timeperiod to help with SSD wear)
  • Provide a reasonable response time (within ~50 ms) on SELECT queries on hot recently inserted data

Looking on these requirements I actually think that TokuDB might be a good fit for this task.

There are solutions in the NoSQL space that are optimized for this scenario: Cassandra or OpenTSDB. Indeed using one of these will have an impact on the application side.

Most of the time when the requirements dictate looking into different solutions, the easiest to estimate is the initial costs: development (nb: this doesn’t include only pure development, but also learning costs, etc.) and hardware costs.

Unfortunately many times we ignore taking into consideration long term costs:

  • maintenance costs (hardware, operations, enhancements)
  • opportunity costs (features that the current architecture won’t be able to support as being either impossible or too expensive)
  • accounting for the risks of failed initial designs (the technical debt costs)

Way too many times we optimize for the initial costs (the general excuse is that familiarity delivers faster—with the more scientific forms: time to market is essential and premature optimization is the root of all evil), while ignoring almost completely the ongoing costs.

Original title and link: Considering TokuDB as an engine for timeseries data… or Cassandra or OpenTSDB (NoSQL database©myNoSQL)

via: http://www.mysqlperformanceblog.com/2013/08/29/considering-tokudb-as-an-engine-for-timeseries-data/


Kairosdb - Fast Scalable Time Series Database

kairosdb is introduced as a rewrite of the OpenTSDB written primarily for Cassandra (nb: OpenTSDB was based on HBase). In terms of what it brings new, this page lists:

  • Uses Guice to load modules.
  • Incorporates Jetty for Rest API and serving up UI.
  • Pure Java build tool (Tablesaw)
  • UI uses Flot and is client side rendered.
  • Ability to customize UI.
  • Relative time now includes month and supports leap years.
  • Modular data store interface supports:
    • HBase
    • Cassandra
    • H2 (For development)
  • Milliseconds data support when using Cassandra.
  • Rest API for querying and submitting data.
  • Build produces deployable tar, rpm and deb packages.
  • Linux start/stop service scripts.
  • Faster.
  • Made aggregations optional (easier to get raw data).
  • Added abilities to import and export data.
  • Aggregators can aggregate data for a specified period.
  • Aggregators can be stacked or “piped” together.

Source code lives on GitHub. Let’s see where it goes.

Original title and link: Kairosdb - Fast Scalable Time Series Database (NoSQL database©myNoSQL)


Why We Chose HBase for AppFirst APM

Its performance had a significant impact on our decision making as well. It sustains an enormous number of writes and the read cycle times were much better than we had anticipated. Further, it gives us the option to interact with the Hadoop Ecosystem, including HDFS, Mapreduce, and Zookeeper frameworks. Our enthusiasm for HBase skyrocketed when we discovered how to create map-reduce apps to do a number of management tasks. While Cassandra also has these capabilities, its data model was fundamentally more complex.

What if the whole post would have said: we chose HBase because of

  1. its seamless integration in the Hadoop ecosystem
  2. the scalable time series OpenTSDB is built on top of HBase?

Original title and link: Why We Chose HBase for AppFirst APM (NoSQL database©myNoSQL)

via: http://blog.appfirst.com/2011/12/22/why-we-chose-hbase/


OpenTSDB: A HBase Scalable Time Series Database

OpenTSDB: a distributed, scalable monitoring system on top of HBase:

Thanks to HBase’s scalability, OpenTSDB allows you to collect many thousands of metrics from thousands of hosts and applications, at a high rate (every few seconds). OpenTSDB will never delete or downsample data and can easily store billions of data points. As a matter of fact, StumbleUpon uses it to keep track of hundred of thousands of time series and collects over 100 million data points per day in their main production cluster.

The source code is available on ☞ GitHub and you can find out more about the project (currently a short intro and a getting started section) ☞ here.

StumbleUpon has built and is using OpenTSDB for the following scenarios:

  • Get real-time state information about our infrastructure and services.
  • Understand outages or how complex systems interact together.
  • Measure SLAs (availability, latency, etc.)
  • Tune our applications and databases for maximum performance.
  • Do capacity planning.

Original title and link: OpenTSDB: A HBase Scalable Time Series Database (NoSQL databases © myNoSQL)


OpenTSDB: A Distributed, Scalable Monitoring System on Top of HBase

Tracking this based on Hadoop world in tweets. StumbleUpon plans to open source ☞ OpenTSDB: a scalable time series database built on top of HBase. The project page explains what isOpenTSDB:

OpenTSDB was originally written to address a common need: store and index metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable.

Most existing open-source monitoring systems are not scalable or flexible enough. With OpenTSDB, and thanks to HBase’s scalability, it’s possible to collect many thousands of metrics from thousands of hosts and applications, at a high rate (every few seconds). OpenTSDB will never delete or downsample data and can easily store billions of data points.

Imagine having the ability to quickly generate a graph of the average number of IOPS your databases do, per database schema, over a period of a week, and on the same graph, plot the number of queries per second your servers are handling to see how much of a correlation there is. OpenTSDB makes this type of operation trivial, while manipulating millions of data point for very fine grained, real-time monitoring.

Sounds good. ☞ GitHub repo already set, but nothing in there yet.

Original title and link: OpenTSDB: A Distributed, Scalable Monitoring System on Top of HBase (NoSQL databases © myNoSQL)