You should use Zstandard for log file compression
(Never heard of Zstandard? Read this after looking at my fancy charts.)
We ship our logs to a popular log analysis tool that provides a web interface and cool data mangling tools, but we also store plaintext copies that we occasionally need to refer to. Logs used to be archived on individual servers, but during a recent architecture shift we set up a centralised aggregation service to preserve logs from ephemeral instances. Centralising log storage also centralises log processing, and the choice of compression tool matters a lot more than it used to when compression and decompression were done in parallel across a whole fleet of machines.
I noticed that operations were taking a lot longer than we’d like, so I took a random log file from one machine (slightly under 8 GiB uncompressed) and ran some tests to see if we could find a better tool.
If you’re looking through large log files with any regularity, you
can stop reading here. zstd
is the clear winner, requiring a mere
37% of the time that its closest competitor did. (There’s also a
clear loser: I left bzip2
off of these charts because I didn’t
like how small it made the other bars.)
Decompression speed is only one third of the puzzle, you say? Well, you’re right.
The venerable gzip
has had a good run, but zstd
is both faster
and produces smaller files. zstd
at its default level (zstd -3
)
takes 1.5 minutes to compress this file to 688 MiB. xz
can put out a
smaller file (xz -6
, left off of this chart, will get it down to 497
MiB in a mere 36 minutes), but bumping up the zstd
level gets within
shouting distance of reasonable xz
levels in considerably less time,
while retaining the advantage of fast decompression.
I’m going to move to Zstandard for log compression, and you should too.