Whenever a request reaches Twitter, we decide if the request should be sampled. We attach a few lightweight trace identifiers and pass them along to all the services used in that request. By only sampling a portion of all the requests we reduce the overhead of tracing, allowing us to always have it enabled in production.
The Zipkin collector receives the data via Scribe and stores it in Cassandra along with a few indexes. The indexes are used by the Zipkin query daemon to find interesting traces to display in the web UI.
There a many APM solutions out there, but sometimes the overhead or the lack of support for specific components may lead to the need of custom solutions like this one. While in a different field, this is just another example of why products and services should be designed with openness and integration in mind.
Original title and link: Tracing Distributed Systems With Twitter Zipkin ( ©myNoSQL)