Mar 19

Generating XML sitemaps with XML::Twig::Elt

Category: Linux,Perl   — Published by tengo on March 19, 2012 at 2:06 pm

The sitemaps protocol (help at Google, official homepage as proposed by Google is gaining wider popularity. When you are running a site and want good visibility in search engines, you surely want to have it crawled regularly and completely. One stepstone into that direction is to serve your content in the form of a XML sitemap, so the crawlers know the layout of your domain and get a list of URLs to fetch.

Well, there are many packages out there that help you generate XML sitemaps. There is a an official python script from Google, a WordPress Plug-In, online generators, and more.

When I looked around for a good tool to do the job, the only option from a perl perspective was to use WWW::Google::SiteMap. As it soon turned out, this helper couldn't do all the tricks I needed it to do, for example it was very inflexible with additional attributes or when you need to build a Video-Sitemap. So more and more I started to write up my own version of a sitemap generator script.

A good choice to construct larger XML files is XML::Twig. What's important is that it gives you more control over the order of elements in the resulting XML file, more than, let's say, vanilla XML::Simple gives you.The problem is that there is a bit of a learning curve. XML::Twig works not by passing it a hash or similar, but by building nodes in the documents, "twigs" and adding them. You start with the innermost element of a nested tag structure and work your way up in the hierarchy, finishing with pasting the constructed element into the document