PHP HTTP Clients and on the fly gzip compression.

2009 July 5 at 06:00 » Tagged as :deflate, gzip,

You can speed up your website, by making your web server perform on the fly gzip compression. Any old web server including the lame IIS can do that. Similarly any old web browser including the badly designed, badly written Internet Explorer can transparently decompress this gzipped content. Unfortunately the same cannot be said about some HTTP client libraries. But it isn't as bad is it sounds. These client libraries simply avoid sending the 'Accept-Encoding: gzip,deflate' header and the server will respond by not deflating the content. So the only thing you lose is a bit of speed. Well you might lose a bit of money also if you pay by the gigabyte for your bandwidth. If you are building a web services client with PHP, you can try to combine the PHP HTTP wrapper the zlib extension to read and deflate gzip compressed data. One php.net contributed note on php.net suggests to 'daisy chain' the zlib:// and http:// wrappers ().

<?     echo file_get_contents("compress.zlib://http://raditha.com/", "r"); ?>

Another approach is to rely on gzopen.

<?

$zp = gzopen("http://raditha.com/", "r"); gzpassthru($zp);

?>

They will both echo out the HTML of the web page at http://raditha.com/ - but appearances can be deceptive. If you analyze the traffic with wireshark (or any old packet sniffer), you will find that the 'Accept-Encoding: gzip,deflate' header is not being sent and it's counterpart 'Content-Encoding: gzip' is not included in the response. The above code appears to work because the PHP Zlib extension will open uncompressed files without complaining. In other words, when it encounters uncompressed data, gzopen behaves exactly like fopen and gzread behaves like fread. So if you want to benefit from the deflate module on your webserver, you will need to use your own client library. How to create a client library is beyond the scope of this post, but I will explain how you can set and get the headers and decode the data. In order to make sure that the server compresses the content you need to do:

fwrite($this->socket, "Accept-Encoding: gzip,deflatern");

Usually images and other binary files will not be gzipped, so you need to make sure that the server sends back the 'Content-Encoding: gzip' header before you actually try to decompress the message body.
$this->headerArray = split("rn", $this->headers);
foreach($this->headerArray as $head)
{
    $parts = split(": ",$head);
    if(strtolower($parts[0]) == 'content-encoding: gzip')
    {
        $gzip=true;
    }
}
It's not a requirement to convert the headers to lower case before comparision; we are just playing safe because sometime you find weird servers not capitalizing the second word in the header. Once you have identified that the content is gzipped, you might be tempted to just use the gzinflate() method to inflate what you read in from the socket.

gzinflate(stream_get_contents($this->socket, $length-10));

you will only run into an error

PHP Warning:  gzinflate(): data error in /var/www/clients/twitter/RadHTTPClient.php on line 261 PHP Stack trace: PHP   1. {main}() /var/www/raditha/http-deflate.php:0 PHP   2. RadHTTPClient->getMessageBody() /var/www/raditha/http-deflate.php::20 PHP   3. gzinflate() /var/www/raditha/RadHTTPClient.php:261

It happens because the gzinflate() function doesn't expect to deal with the gzip header. The header contains the gzip magic number as well as other flags such as file modification time etc. This header has to be stripped out before you pass it into the gzinflate function.
if($useGzip)
{
    $b = fread($this->socket,2);
    if(ord($b[0]) == 0x1f && ord($b[1]) == 0x8b)
    {
        $gzHead = fread($this->socket,8);
        $s = gzinflate(stream_get_contents($this->socket, $length-10));
    }
}
if the magic number 0x1f8b is not found you can treat the data as already uncompressed data (of course you will need to add the 'else' to go with it). One last thing; if you want to enable one the fly gzip compression with Apache all you need to do is to add the following directives to the httpd.conf file

LoadModule deflate_module modules/mod_deflate.so

AddOutputFilterByType DEFLATE text/html text/plain text/xml