<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: How To Optimize Your Database Backups and Text File Compression with pbzip2 and pigz</title>
	<atom:link href="http://blog.gogrid.com/2012/02/09/how-to-optimize-your-database-backups-and-text-file-compression-with-pbzip2-and-pigz/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.gogrid.com/2012/02/09/how-to-optimize-your-database-backups-and-text-file-compression-with-pbzip2-and-pigz/</link>
	<description>&#34;Complex Infrastructure Made Easy™&#34;</description>
	<lastBuildDate>Wed, 08 May 2013 09:46:28 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: gogridzack</title>
		<link>http://blog.gogrid.com/2012/02/09/how-to-optimize-your-database-backups-and-text-file-compression-with-pbzip2-and-pigz/#comment-44111</link>
		<dc:creator>gogridzack</dc:creator>
		<pubDate>Fri, 09 Mar 2012 20:47:46 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gogrid.com/?p=3867#comment-44111</guid>
		<description><![CDATA[Hi jbohm, 
 
Thanks for your reply. 
 
---8&lt;--- 
I think there must be some typo near &quot;127M down from 1.3GB using bzip2 or pbzip2&quot; . If pbzip2 does what it says it should &gt;produce exactly the same output as bzip2, and certainly the same output as itself.  
---&gt;8--- 
 
There is no typo in this statement; &quot;using bzip2 or bzip2&quot; was just expressing that either program resulted in the same size. 1.3GB is the original size of the file. 
 
---8&lt;--- 
Besides that, there is a key downside to this technique: While the wall time to do the compression gets smaller, the risk of taking away CPU from production processes (such as the database engine on this server) increases. Depending on server specifics, this may be mitigated by careful use of the &quot;nice&quot; command to run the parallel compression job at a lower priority.  
---&gt;8--- 
 
Yes, one can use the nice command as with any CPU-intensive process. Please take note that this is just an example; as I&#039;m sure you would agree, this is a very general statement and must be applied to anything that&#039;s CPU intensive like compressing. I tested this on a production database server that does about 2K queries/sec, and I saw almost no performance decrease when compressing those files with the parallel implementations. Your results may very and you should always exercise caution in production environments. 
 
---8&lt;--- 
RAM consumption may or may not be an issue, as a parallel bzip2 will probably increase its memory consumption by about 9.4MB per CPU used (400K + 8 x 900K + 2x900K buffers to hold input and output as external input and output happens serially).  
---&gt;8--- 
 
Interesting information and dully noted. Unfortunately, I can&#039;t cover every single difference on the system that will be felt by using the parallel versions of these compression tools.  
 
---8&lt;--- 
A related issue for virtual servers is if the surrounding framework (such as the GoGrid API) allows the CPU and RAM allocation to a server to be temporarily boosted for the few minutes that particular server is running a scheduled backup job. So far in the industry, I have only seen APIs that allow such adjustments while the virtual server is shut down, which typically isn&#039;t a good thing to do in a nightly backup job. And yes, I have seen this limitation in the underlying virtualization engines too, so it is hardly GoGrid&#039;s fault. 
---&gt;8--- 
 
This seems more like a generality in cloud computing, and not really related to the topic at hand. 
 
Thanks again for your response, and hopefully I cleared up some of your concerns. ]]></description>
		<content:encoded><![CDATA[<p>Hi jbohm, </p>
<p>Thanks for your reply. </p>
<p>&#8212;8&lt;&#8212;<br />
I think there must be some typo near &quot;127M down from 1.3GB using bzip2 or pbzip2&quot; . If pbzip2 does what it says it should &gt;produce exactly the same output as bzip2, and certainly the same output as itself.<br />
&#8212;&gt;8&#8212; </p>
<p>There is no typo in this statement; &quot;using bzip2 or bzip2&quot; was just expressing that either program resulted in the same size. 1.3GB is the original size of the file. </p>
<p>&#8212;8&lt;&#8212;<br />
Besides that, there is a key downside to this technique: While the wall time to do the compression gets smaller, the risk of taking away CPU from production processes (such as the database engine on this server) increases. Depending on server specifics, this may be mitigated by careful use of the &quot;nice&quot; command to run the parallel compression job at a lower priority.<br />
&#8212;&gt;8&#8212; </p>
<p>Yes, one can use the nice command as with any CPU-intensive process. Please take note that this is just an example; as I&#039;m sure you would agree, this is a very general statement and must be applied to anything that&#039;s CPU intensive like compressing. I tested this on a production database server that does about 2K queries/sec, and I saw almost no performance decrease when compressing those files with the parallel implementations. Your results may very and you should always exercise caution in production environments. </p>
<p>&#8212;8&lt;&#8212;<br />
RAM consumption may or may not be an issue, as a parallel bzip2 will probably increase its memory consumption by about 9.4MB per CPU used (400K + 8 x 900K + 2x900K buffers to hold input and output as external input and output happens serially).<br />
&#8212;&gt;8&#8212; </p>
<p>Interesting information and dully noted. Unfortunately, I can&#039;t cover every single difference on the system that will be felt by using the parallel versions of these compression tools.  </p>
<p>&#8212;8&lt;&#8212;<br />
A related issue for virtual servers is if the surrounding framework (such as the GoGrid API) allows the CPU and RAM allocation to a server to be temporarily boosted for the few minutes that particular server is running a scheduled backup job. So far in the industry, I have only seen APIs that allow such adjustments while the virtual server is shut down, which typically isn&#039;t a good thing to do in a nightly backup job. And yes, I have seen this limitation in the underlying virtualization engines too, so it is hardly GoGrid&#039;s fault.<br />
&#8212;&gt;8&#8212; </p>
<p>This seems more like a generality in cloud computing, and not really related to the topic at hand. </p>
<p>Thanks again for your response, and hopefully I cleared up some of your concerns. </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jbohm</title>
		<link>http://blog.gogrid.com/2012/02/09/how-to-optimize-your-database-backups-and-text-file-compression-with-pbzip2-and-pigz/#comment-44099</link>
		<dc:creator>jbohm</dc:creator>
		<pubDate>Wed, 07 Mar 2012 17:25:29 +0000</pubDate>
		<guid isPermaLink="false">http://blog.gogrid.com/?p=3867#comment-44099</guid>
		<description><![CDATA[I think there must be some typo near &quot;127M down from 1.3GB using bzip2 or pbzip2&quot; .  If pbzip2 does what it says it should produce exactly the same output as bzip2, and certainly the same output as itself. 
 
Besides that, there is a key downside to this technique: While the wall time to do the compression gets smaller, the risk of taking away CPU from production processes (such as the database engine on this server) increases.  Depending on server specifics, this may be mitigated by careful use of the &quot;nice&quot; command to run the parallel compression job at a lower priority. 
 
RAM consumption may or may not be an issue, as a parallel bzip2 will probably increase its memory consumption by about 9.4MB per CPU used (400K + 8 x 900K + 2x900K buffers to hold input and output as external input and output happens serially). 
 
A related issue for virtual servers is if the surrounding framework (such as the GoGrid API) allows the CPU and RAM allocation to a server to be temporarily boosted for the few minutes that particular server is running a scheduled backup job.  So far in the industry, I have only seen APIs that allow such adjustments while the virtual server is shut down, which typically isn&#039;t a good thing to do in a nightly backup job.  And yes, I have seen this limitation in the underlying virtualization engines too, so it is hardly GoGrid&#039;s fault. ]]></description>
		<content:encoded><![CDATA[<p>I think there must be some typo near &quot;127M down from 1.3GB using bzip2 or pbzip2&quot; .  If pbzip2 does what it says it should produce exactly the same output as bzip2, and certainly the same output as itself. </p>
<p>Besides that, there is a key downside to this technique: While the wall time to do the compression gets smaller, the risk of taking away CPU from production processes (such as the database engine on this server) increases.  Depending on server specifics, this may be mitigated by careful use of the &quot;nice&quot; command to run the parallel compression job at a lower priority. </p>
<p>RAM consumption may or may not be an issue, as a parallel bzip2 will probably increase its memory consumption by about 9.4MB per CPU used (400K + 8 x 900K + 2x900K buffers to hold input and output as external input and output happens serially). </p>
<p>A related issue for virtual servers is if the surrounding framework (such as the GoGrid API) allows the CPU and RAM allocation to a server to be temporarily boosted for the few minutes that particular server is running a scheduled backup job.  So far in the industry, I have only seen APIs that allow such adjustments while the virtual server is shut down, which typically isn&#039;t a good thing to do in a nightly backup job.  And yes, I have seen this limitation in the underlying virtualization engines too, so it is hardly GoGrid&#039;s fault. </p>
]]></content:encoded>
	</item>
</channel>
</rss>
