A lot of apps get to ship logs and while there are probably numerous tools to help with this, Apache Flume1 is the one I’d look first (even if for taking inpiration on how to do things):
An important decision to make when designing your Flume flow is what
type of channel you want to use. At the time of this writing, the
two recommended channels are the file channel and the memory
channel. The file channel is a durable channel, as it persists all
events that are stored in it to disk. So, even if the Java virtual
machine is killed, or the operating system crashes or reboots,
events that were not successfully transferred to the next agent in
the pipeline will still be there when the Flume agent is restarted.
The memory channel is a volatile channel, as it buffers events in
memory only: if the Java process dies, any events stored in the
memory channel are lost. Naturally, the memory channel also exhibits
very low put/take latencies compared to the file channel, even for a
batch size of 1. Since the number of events that can be stored is
limited by available RAM, its ability to buffer events in the case
of temporary downstream failure is quite limited. The file channel,
on the other hand, has far superior buffering capability due to
utilizing cheap, abundant hard disk space.
Just a couple of extra-thoughts:
- Flume NG seems to offer 3 types of channels: file, jdbc, memory.
- For the memory channel, I’d be adding an option to start dropping events if the memory consumption goes above a configurable threshold (this might already be implemented, but I couldn’t find it)
- Would it be worth investigating a channel based on LinkedIn’s low latency transfer Databus tool?
Original title and link: Apache Flume Performance Tuning ( ©myNoSQL)