Recently, GoGrid was examining performance enhancements on several internal processes; among these enhancements was switching from standard gzip to “pigz”. Since I had never heard of this “pigz”, I was intrigued by this supposed “parallel” implementation of gzip; meaning it uses all available CPU’s/cores unlike gzip. This prompted me to ask, “I wonder if there is a parallel implementation of bzip2 as well”, and there began my endeavor.
pigz and pbzip2 are multi-threaded (SMP) implementations of their respective idol file compressors. They are both actively maintained and are fully compatible with all current bzip2 and gzip archives.
If you’re like me, you might’ve stayed away from using gzip or bzip2 due to the single-threaded aspect. If I try to compress a, let’s say, 2GB file, the system becomes rather sluggish; the reason being is that the “compression tool of choice” uses almost all of 1 core of today’s multi-core, multi-CPU systems and creates an uneven load between the cores, causing the CPU to operate very inefficiently.
In this example I have a .tar file with several databases in it, which totals 1.3GB. The system in question is a GoGrid dedicated server with 8 cores. The server’s load is around 1 and is a production database server.
Using bzip2, the file took approximately 6 minutes and 30 seconds to compress. Yikes!