You should use Zstandard for log file compression

(Never heard of Zstandard? Read this after looking at my fancy charts.)

We ship our logs to a popular log analysis tool that provides a web interface and cool data mangling tools, but we also store plaintext copies that we occasionally need to refer to. Logs used to be archived on individual servers, but during a recent architecture shift we set up a centralised aggregation service to preserve logs from ephemeral instances. Centralising log storage also centralises log processing, and the choice of compression tool matters a lot more than it used to when compression and decompression were done in parallel across a whole fleet of machines.

I noticed that operations were taking a lot longer than we’d like, so I took a random log file from one machine (slightly under 8 GiB uncompressed) and ran some tests to see if we could find a better tool.

If you’re looking through large log files with any regularity, you can stop reading here. zstd is the clear winner, requiring a mere 37% of the time that its closest competitor did. (There’s also a clear loser: I left bzip2 off of these charts because I didn’t like how small it made the other bars.)

Decompression speed is only one third of the puzzle, you say? Well, you’re right.

The venerable gzip has had a good run, but zstd is both faster and produces smaller files. zstd at its default level (zstd -3) takes 1.5 minutes to compress this file to 688 MiB. xz can put out a smaller file (xz -6, left off of this chart, will get it down to 497 MiB in a mere 36 minutes), but bumping up the zstd level gets within shouting distance of reasonable xz levels in considerably less time, while retaining the advantage of fast decompression.

I’m going to move to Zstandard for log compression, and you should too.