Jonathan Ellis introduces in two posts—here and here—a new feature in Cassandra 1.2: request tracing. Basically such a feature is an improved approach over more generic APM tools like AppDynamics or NewRelic.
Be judicious with this: tracing a request will usually requre at least 10 rows to be inserted, so it is far from free. Unless you are under very light load tracing all requests (probability 1.0) will probably overwhelm your system. I recommend starting with a small fraction, e.g. 0.001 and increasing that only if necessary.
Years ago I had to implement myself a tracing layer1, after trying to get information from that system using some commercial tools—I’m sure these got better since then though. There were a few goals I’ve planned for and there were many things I’ve learned after deploying it live:
- granularity of the probes is critical to understanding how the system behaves. Use too coarse grained probes and you’ll miss important details, use too fine grained probes and you’ll be flooded with unusable data
- deciding if traces are persistent or volatile and the impact on the system performance. Should you be able to retrieve older traces? If persistent, do they contain enough information to help explain a specific behavior? Can they be used to replay a scenario?
- deciding what requests should be traced and when? Tracing comes with a cost and you must try to minimize the impact it has on the system. The most important data is needed when the system misbehaves or is under high load, but that’s the same time additional work could bring it down
- probabilistic vs pattern vs behavioral tracing. Generic solutions have no knowledge of the system, but a custom one could be created
- trace ordering. Can historical tracing information be ordered?
And there are probably many other things that I don’t remember right anymore.
My implementation was specific to the system (in the sense that it had different tracing capabilities based on request types), but it was generic enough to allow us to change the granularity of collected probes, introduce new trace points, and also change the ratio of the requests to be traced. ↩
Original title and link: Cassandra Application Performance Management With Request Tracing ( ©myNoSQL)