Recently, GoGrid was examining performance enhancements on several internal processes; among these enhancements was switching from standard gzip to “pigz”. Since I had never heard of this “pigz”, I was intrigued by this supposed “parallel” implementation of gzip; meaning it uses all available CPU’s/cores unlike gzip. This prompted me to ask, “I wonder if there is a parallel implementation of bzip2 as well”, and there began my endeavor.
pigz and pbzip2 are multi-threaded (SMP) implementations of their respective idol file compressors. They are both actively maintained and are fully compatible with all current bzip2 and gzip archives.
If you’re like me, you might’ve stayed away from using gzip or bzip2 due to the single-threaded aspect. If I try to compress a, let’s say, 2GB file, the system becomes rather sluggish; the reason being is that the “compression tool of choice” uses almost all of 1 core of today’s multi-core, multi-CPU systems and creates an uneven load between the cores, causing the CPU to operate very inefficiently.
In this example I have a .tar file with several databases in it, which totals 1.3GB. The system in question is a GoGrid dedicated server with 8 cores. The server’s load is around 1 and is a production database server.
Using bzip2, the file took approximately 6 minutes and 30 seconds to compress. Yikes!
Now we’ll try this again with pbzip2, the parallel implementation of bzip2.
Not only did pbzip2 take roughly 1/7th the time as regular bzip2, it has the verbose option that provides some nice output and a progress bar (not visible here) while compressing.
The file compressed to an impressive 127M down from 1.3GB using bzip2 or pbzip2
Now let’s try the same test with gzip2 and pigz.
# time gzip dbbackup-12_10_2011.tar
gzip took a considerably less amount of time than bzip2 to compress the same archive; roughly 40 seconds instead of 6 and 1/2 minutes. However, the resulting file is a bit bigger at 177M.
Now with pigz.
pigz took about 1/7th the time as gzip did and clocked in around 6 seconds; an impressive speed again.
Both parallel implementations provided a very large time difference in comparison to the standard, single-threaded implementations of these compression tools.
pbzip2 and pigz provided an almost spot-on 7/8ths time difference between using 1 and all 8 CPU cores — an impressive performance gain. pbzip2 and bzip2 also compress the files a bit better than gzip – %93 compression ratio.
In particular, this could save you a lot of money when using GoGrid Cloud Storage, as well as save your Cloud or Dedicated resources and stability for the purposes they were intended for in the first place.
Depending on how these archives will be used, you may consider either implementation based on your needs. pbzip2 has a higher compression percentage, while pigz is faster.
Whatever you choose, hopefully this will increase the usability of compression tools and help provide a more stable and optimized environment GoGrid environment.