Jul 09
Varnish lessons learned
Varnish is a high speed caching daemon or HTTP accelerator. I recently learned some lessons while implementing it on a host. I'd like to share here some bits of information which might be useful to others struggling with their install.
Config file locations
If you've installed varnish via apt-get install varnish, on Debian 5.0 the config file which starts the daemon process resides at: /etc/default/varnish
This conf file also tells varnish if it should look for a .vcl file with additional rules. The common location for such a file is: /etc/varnish/default.vcl
Varnish's rapid version and feature changes
Varnish is an active project and when you look for example configurations around the web, you will likely ran into .vcl files which are for another version as the one you've got on your server.
Missing variables on 1.x branch
Some examples of such missing features I ran into were that my standard debian 5.0 varnish-1.1.2 does not provide the obj.ttl and obj.hits variables. So a nice debugging header I wanted to add needed downsizing in functionality like this:
sub vcl_hit {
# if(obj.hits > 0){
# set obj.http.X-Cache = "Hit" obj.hits "" obj.ttl;
set obj.http.X-Cache = "Hit";
# }else{
# set obj.http.X-Cache = "Hit: first";
# }
Named Virtual Host trouble
The machine varnish was intended to run on hosts several sites via named virtual hosts/ apache2. When I ran a test setup with apache2 on port 80 and varnish on port 8080, accessing example.com:8080 served all the right sites, interchanging "example" with the various domains. Now, when it came to setting varnish live, changing the port setup the other way round resulted in Error 400: bad request. A look at varnishlog showed that the Host: headers seemed to be missing from varnish's processing, or at least the port extension, Host: example.com:8080 (now the backend) seemed to be wrongly handled.
Whatever was wrong with my installation/setup or varnish, I could fix it with:
sub vcl_recv {
set req.backend = default;
set req.http.host = req.http.host
malloc size declaration
Earlier varnish versions do allow to set a size for the -s file parameter like so -s file,/tmp/varnish/$INSTANCE,5G
but they do not provide the size limit variable on the malloc type!
Dealing with compressed content
If you decided to serve your content via gzip or deflate http compression - a good move even Google seems to reward these days - varnish needs a common rule to properly handle the Vary: header and the different versions in the cache. A common but slightly modified version of the standard ruleset, including a hack for the always broken IE, is:
if(req.http.Accept-Encoding){
if(req.url ~ ".(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$"){
remove req.http.Accept-Encoding;
}elsif(req.http.Accept-Encoding ~ "gzip"){
set req.http.Accept-Encoding = "gzip";
}elsif(req.http.Accept-Encoding ~ "deflate"){
set req.http.Accept-Encoding = "deflate";
}else{
# unkown compression
remove req.http.Accept-Encoding;
}
if(req.url ~ ".(css|js)$" && req.http.User-Agent ~ "MSIE 6"){
remove req.http.Accept-Encoding;
}
}
Compare: varnish official on compression.
Check if you really filter cookies correctly
If your site features login, check your varnish installation twice if you handle cookies correctly!
With advertising on your site, you probably will have cookies set within your domain which bloat the number of versions varnish will cache or in default setting will actually prevent varnish from caching pages completely as the default policy is to not serve cached objects to requests with cookies.
sub vcl_recv {
if(req.http.Cookie){
# care about our cookies and pass these requests through uncached
if(req.http.Cookie ~ "SID=" ) {
pass;
}
# remove every other cookies so varnish does serve these as well
remove req.http.Cookie;
}
The above code tells varnish to simply pass through requests which carry a session string cookie, here "SID". Be aware that
req.http.Cookie == "SID"
will not work, at least on my early varnish version it didn't. The if clause in this case will not automagically check for the existence of a key called "SID", but only look on the Cookie string as a whole. That's why I use a regex looking for "SID=" here.