Conteg v0.13 - Content Negotiation + Cache-Control for PHP-originated Web Output.
Introduction
------------
Conteg attempts to do for dynamic content what most web-servers provide as standard for static content,
but *never* provide for dynamic (such as PHP originated) files and pages.
The principal value provided is to:
* Reduce bandwidth
* Increase perceived server responsiveness
It achieves each of the above by utilising the content-negotiation features of the HTTP protocol
- provided as standard by HTTP/1.0, and considerably extended within HTTP/1.1. Thus, each of these
twin goals can be reached with the oldest of browsers likely to be found accessing a website today.
The requirements are:
* PHP 4.1.0+
* zlib (for compression - odds are you have it)
* `ExtendedStatus On' in httpd.conf (Apache) (re: $_SERVER-type variables)
The simplest possible usage requires 3 lines of code (one of which is inclusion of the Class) and,
by default, will:
* Auto-negotiate load-balanced compression
- Accept-Encoding negotiation accommodates known browser foibles
- load-balancing currently works on Linux, FreeBSD only
* Auto-switch according to HTTP protocol version
* Send Content-Type, Charset, Content-Language + Expiry Response headers
- by default, 'text/html', 'ISO-8859-1', 'en' + 3600
* Send Cache-Control headers to ensure full caching of content
* Report Referer, Browser + OS-platform
* Report a full range of Request headers
Load-balanced compression: high server load means lower compression.
Stats: < 0.002 secs on a twin Xeon 2.4 GHz, Linux 2.6
compression: typical values of +70% at level 8/9
(reduction to one third of original size) (real-world examples)
There is then a further series of more complex actions possible, all achieved via the (single)
Constructor parameter. This parameter is an array of key-value pairs (currently, 31 are available,
see bottom). The array keys switch various aspects of HTTP Content Negotiation on/off, and may be
used with the same Class instance more than once. Here is an example of simple usage beyond the
defaults:
$param = array(
'use_etag' => TRUE, // default is Weak ETags (HTTP/1.1 only)
'modified' => $mdate, // Unix timestamp
'expiry' => 3600 // set expiry date 1 hour from time()
);
$Instance = new Conteg( $param );
With the appropriate switches, Conteg will auto-negotiate:
* If-Modified-Since
* If-Unmodified-Since
* If-None-Match
* If-Match
* Range
* If-Range
* Accept-Charset (external negotiation)
* Accept-Language (external negotiation)
* Accept (external negotiation)
...auto-sending the correct:
* 304 Not Modified
* 406 Not Acceptable
* 412 Precondition Failed
* 416 Requested Range Not Satisfiable
* 206 Partial content, or
* 200 OK page
+ full headers
In addition to the above, the Class offers:
* Cache-Control
- Request headers (external negotiation)
- Response headers individual control
- ('cache-all'/'cache-none' macros also provided)
* 404/410 header
- for custom error pages; by default, auto-fixed for MSIE browsers
- By default, under HTTP/1.1 switches 404 to 410
* User-Decided Status headers
- By default, auto-fixed for MSIE browsers
* Document-wide Search + Replace
- By default, this makes reporting of compression stats
+possible _on_the_compressed_page_
* Apache Notes logging
- By default, for logfile compression-stats reporting
* Response Headers attempt to be RFC-compliant
* released under the GNU LGPL
* All switches have defaults
* Every default may be over-ridden
Background Info
---------------
Entire books have been written on the HTTP protocol, Content Negotiation and Cache-Control.
This little section will not even attempt to be comprehensive. It will, instead, give a
kiddie-type simple view on some of the background issues involved.
When a browser makes a request to a web-server, and the server responds, there are two sets of
packaged-information transferred in different directions:
* 1 Browser -> Server (Request)
* 2 Browser <- Server (Response)
The first important thing to understand is that these packages (packets) of info are organised
in different ways:
* 1 Request: 1 set of info (headers only)
* 2 Response: 2 sets of info (headers + content)
The second point is to understand that these 2 items (headers + content) are treated in different
ways (as the most obvious example, one you end up seeing--content, and the other usually goes on
'under the hood'--headers). There are more differences than that:
* Headers:
- plain text
- fixed format
- always has at least one item
- usually hidden
* Content:
- encoded (which may be as plain text!)
- free format (but is delineated within the headers)
- may be empty
- usually displayed
...and so on.
The headers are of primary importance, and are the sole item used within Content Negotiation. Perhaps
'Content Negotiation' should more accurately be described as 'Header Negotiation'. Response headers
describe the content that follows them within the packets but, in addition, Content Negotiation can
radically affect the Content that is delivered, even to the point that it is entirely absent.
The third and final, vital, point for PHP users to get hold of is:
* PHP will simply, and easily, produce web content, but
* it performs no Content Negotiation of any kind
(unless instructed to do so)
This single fact is responsible for the endless cries of woe in Webmaster forums, saying: "Google is
hammering my site!", or "My website has used up it's monthly bandwidth in just 3 days".
If you do not implement Content Negotiation, your website will suffer the consequences.
PS
'Cache-Control' is actually examples of specific Request and Response headers, designed to affect
the behaviour of public and/or private caches (proxy and browser).
Final Thoughts
--------------
You may well come to believe--as I do--that the facility provided by Conteg is vital for all
webmasters that use PHP. It will not be surprising, however, if that is not yet *your* view, since
it is very difficult to see what is not there. To highlight this, let's look at a very typical
scenario:
The simplest possible example of Content-Negotiation
----------------------------------------------------
A person viewing a web-page presses the 'Back' button. This is what happens "under the hood" in
two situations; default installations in both cases:
Example 1: A website consisting of static files:
1 The browser either:
i) pulls the file from the hdd cache, or
ii) sends a request to the website for the previous page
If (ii), the website responds with a "304 Not Modified" status response (just that: < 30 bytes), so:
2 The browser pulls the file from the hdd cache.
What the user sees: a responsive website
What the hostmaster sees: optimised bandwidth and server load
Example 2: A website consisting of PHP-originated content:
1 The browser sends a request to the website for the previous page
2 The website re-sends the requested page
(all of it, even if it is identical to the proxy/browser cache)
What the user sees: delays
What the hostmaster sees: high bandwidth and server load
(particularly if Google starts to hammer your site)
...unless Conteg is implemented, of course!
Some references
---------------
* compression: rfc2616 (Sections: 3.5, 14.3, 14.11)
* HTTP/1.1: http://www.w3.org/Protocols/rfc2616/rfc2616.html
* HTTP/1.0: http://www.w3.org/Protocols/rfc1945/rfc1945.txt
* v0.13 release announcement : http://forums.modem-help.com/viewtopic.php?t=670
* v0.12.3 : http://forums.modem-help.com/viewtopic.php?t=603
* v0.12.2 : http://forums.modem-help.com/viewtopic.php?t=581
* v0.12.1 : http://forums.modem-help.com/viewtopic.php?t=568
* v0.10 : http://forums.modem-help.com/viewtopic.php?t=128
Final Items
-----------
The following is copied from Comments included with the Class.
Cache-Control array:
Cache-Control is switched within the Constructor parameter array (see below) by the
'cache_control' key-value pair. It is by far the most complex of all the possible control-keys
for the Class.
* Any of the individual parts of the Cache-Control header may be set; see the comments
* +to setup() for details. For your convenience, here are all of the sub-parts:
$Instance = new Conteg(
array(
'cache_control' => array(
'max-age' => (int), // secs; overrides the Expires header
'must-revalidate', // forces caches to validate every request with server
'no-cache',
'no-store',
'no-transform',
'post-check' => (int),
'pre-check' => (int),
'private',
'proxy-revalidate',
'public',
's-maxage' => (int), // secs; for shared (not private) caches
'pragma' => (string), // strictly, not Cache-Control
'macro' => 'cache-all', // cache under all circumstances (default)
'macro' => 'cache-none' // never cache
)
)
);
* Any `no-cache' value will cause the `Expires' value to be reset to a date in the past.
Constructor parameter array:
* These are all the possible array values within the single parameter supplied to
* +the constructor, and acted upon within setup(), with defaults.
* Note: none of the following is required - these are the program defaults:
array(
'404' => FALSE, // higher precedence than 'http_status'
'404_to_410' => TRUE, // see sendStatusHeader()
'cache_control' => array( 'macro' => 'cache-all' ), // see setup() and above
'charset' => 'ISO-8859-1',
'dupe_status_header' => TRUE, // see sendStatusHeader()
'encodings' => array( 'gzip','deflate','compress' ),
'etag' => '',
'expiry' => 3600, // secs after time()
'http_status' => NULL, // preferred to program-decided status
'input' => 'instream', // Apache-Notes
'lang' => 'en',
'modified' => NULL, // sets $last_modified to time()
'msie_error_fix' => TRUE, // avoid MSIE `friendly' error pages
'noprint' => FALSE, // print on instantiation
'other_var' => '', // extra string to affect (weak) ETag
'output' => 'outstream', // Apache-Notes
'prefer' => array(),
'ratio' => 'ratio', // Apache-Notes
'referer_lower_case' => TRUE,
'search' => array(),
'type' => 'text/html',
'use_accept' => FALSE,
'use_accept_charset' => FALSE,
'use_accept_encode' => TRUE,
'use_accept_lang' => FALSE,
'use_accept_ranges' => FALSE,
'use_apache_notes' => FALSE,
'use_content_lang' => TRUE,
'use_content_type' => TRUE,
'use_etag' => FALSE,
'weak_etag' => TRUE
)
Note: the default is to print (send) the content immediately upon instantiation of the Class.
Use `'noprint' => TRUE' for any external negotiation, then `$instance->show()' when ready to send.
(c) Alex Kemp 23 February, 2007
website: http://www.modem-help.com/
(email address with-held due to historical/hysterical problems)
(contact instead via PM at http://forums.modem-help.com/)
|