Auto reload and poor man’s http caching in PHP

I recently had a problem with my website zimmer69.de. Some not so nice fellow told his opera browser to reload a 300k page every 5 seconds, moved it to some tab and forgot about it for weeks. He caused 2 Gb of traffic per day. Time to implement some cache control.

My first measure to gain some time, was to delay the delivery of every page to opera browsers by 30 seconds. After finding out about the feature that caused the trouble, I was close to just exclude opera from the site. But Firefox has the same functionality by some add-on. So I needed a different solution.

Some part of the fault was on my side, since i did not have any cache control implemented. Fortunally the “reload every 5 seconds”-feature supports caching. I finally implemented proper caching, but I want to show a quick and dirty solution, that would have reduced the load considerably without a proper solution.

Caching

There are different levels of caching in each layer starting from the browser cache, web cache, reverse proxies, page-caches, code caches and then you database has a query cache and your opperating system too down to the caching in your CPU. I’m only intrested browser and web cache, since I want to reduce my bandwith usage.

You may read a good description of web caches but you can also just read on.

Poor man’s cache

A simple solution to aboves opera problem is to just pretend that the page does not change for a certain amount of time. That’s what the following function does:

# Implements a fake cache control. Reads/sends headers so that the page will only get stale
# if it was deliverd more than $seconds seconds ago.
# If $ua_regexp is given and the user agent does not match it, then no fake caching takes place.
function fake_cache($seconds = 60, $ua_regexp = "") {
  $ims_epoch = strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE']);
  $now_epoch = time();
  if ($ua_regexp && !eregi($ua_regexp, $_SERVER['HTTP_USER_AGENT'])) {
    return;
  }
  if ($now_epoch < $ims_epoch + $seconds) {
    #old version is still vald
    header("HTTP/1.0 304 Not Modified");
    exit;
  }
  $mod_gmt = gmdate("D, d M Y H:i:s", $now_epoch) ." GMT";
  $exp_gmt = gmdate("D, d M Y H:i:s", $now_epoch + $seconds) ." GMT";
  header("Expires: $exp_gmt");
  header("Last-Modified: $mod_gmt");
}

Now you can put

fake_cache();

in front of your big pages, so that they will be cached for 60 seconds or

fake_cache(600,"opera");

if you want to deliver your pages only every 10 minutes to unfriendly opera users.

Note that every first time visitor gets a current version of the page.

Opera is hammering my server, who is to blame?

I think it is bad practice by opera to offer such a "reload every 5 seconds" feature to users not aware what this could mean. This feature makes opera a bot with a nice gui. But it does not check for robots.txt nor could a webmaster limit this stupidity through it.

Of course opera only does what the user tells it to do. But I really think that opera should warn him if he choses a short reload interval and there is no caching going on. At least after restart of opera it could stop reloading the page or nag the user about it. The command line utility ping made a better decision in this respect:

me@mybox$ ping -f www.opera.com
PING front.opera.com (195.189.143.147) 56(84) bytes of data.
ping: cannot flood; minimal interval, allowed for user, is 200ms

Note

Update 2014-06-01: If you want to actually cache the content server side to save the db some queries have a look at phpfastcache.com.

References