BigTable: All content tagged as BigTable in NoSQL databases and polyglot persistence
Jonathan Ellis introduces in two posts—here and here—a new feature in Cassandra 1.2: request tracing. Basically such a feature is an improved approach over more generic APM tools like AppDynamics or NewRelic.
Be judicious with this: tracing a request will usually requre at least 10 rows to be inserted, so it is far from free. Unless you are under very light load tracing all requests (probability 1.0) will probably overwhelm your system. I recommend starting with a small fraction, e.g. 0.001 and increasing that only if necessary.
Years ago I had to implement myself a tracing layer1, after trying to get information from that system using some commercial tools—I’m sure these got better since then though. There were a few goals I’ve planned for and there were many things I’ve learned after deploying it live:
- granularity of the probes is critical to understanding how the system behaves. Use too coarse grained probes and you’ll miss important details, use too fine grained probes and you’ll be flooded with unusable data
- deciding if traces are persistent or volatile and the impact on the system performance. Should you be able to retrieve older traces? If persistent, do they contain enough information to help explain a specific behavior? Can they be used to replay a scenario?
- deciding what requests should be traced and when? Tracing comes with a cost and you must try to minimize the impact it has on the system. The most important data is needed when the system misbehaves or is under high load, but that’s the same time additional work could bring it down
- probabilistic vs pattern vs behavioral tracing. Generic solutions have no knowledge of the system, but a custom one could be created
- trace ordering. Can historical tracing information be ordered?
And there are probably many other things that I don’t remember right anymore.
My implementation was specific to the system (in the sense that it had different tracing capabilities based on request types), but it was generic enough to allow us to change the granularity of collected probes, introduce new trace points, and also change the ratio of the requests to be traced. ↩
Original title and link: Cassandra Application Performance Management With Request Tracing ( ©myNoSQL)
A three part article from Hortonworks showing how Pig can be used with MongoDB, HBase, and Cassandra:
Pig has emerged as the ‘duct tape’ of Big Data, enabling you to send data between distributed systems in a few lines of code. In this series, we’re going to show you how to use Hadoop and Pig to connect different distributed systems, to enable you to process data from wherever and to wherever you like.
- Part 1: Pig, MongoDB and Node.js
- Part 2: Pig, HBase, JRuby and Sinatra
- Part 3: TF-IDF Topics with Cassandra, Python Streaming and Flask
Original title and link: Pig the Big Data Duct Tape: Examples for MongoDB, HBase, and Cassandra ( ©myNoSQL)
I’d say that raising another $25 million from Meritech Capital Partners and with the participation of existing investors Lightspeed Venture Partners and Crosslink Capital is a good enough reason for DataStax to party.
DataStax will use the funds to further enhance its Big Data platform and increase the value for current customers while driving global customer acquisition.
Congrats to DataStax and Cassandra community!
Original title and link: $25 Million in C Round for DataStax ( ©myNoSQL)
It’s unfortunate that the post focuses mostly on the usage of Spring and RabitMQ and the slidedeck doesn’t dive deeper into the architecture, data flows, and data stores, but the diagrams below should give you an idea of this truly polyglot persistentency architecture:
The slide deck presenting architecture principles and numbers about the platform after the break.