Create static web apps with Wget
Last week we covered Wallflower an awesome utility for generating static websites from Perl web applications. This week we’re covering an alternative method, that uses Wget. One benefit of this method is it can be used on any dynamic web application, not just Perl ones.
You’ll need Wget installed - if you’re using Linux it should already be installed. OSX users can install it with Homebrew and there is a Windows version available. To follow this example you’ll also need Dancer2 installed, which you can get via cpan:
$ cpan Dancer2
Create the application
We’ll use Dancer2 to create a basic skeleton app:
Lets start the app:
$ ./MyApp/bin/app.pl >> Dancer2 v0.143000 server 435 listening on http://0.0.0.0:3000
Create the static site
We’ll point Wget at the site in recursive mode, so that it pulls all the files we need (up to a depth of 5 by default).
$ wget -r 0:3000 -d 0:3000 --page-requisites
Here we pass Wget the following options:
- ”-r 0:3000” to recursively follow links from 0:3000
- -“d 0:3000” to only save static files from the local domain
- ”–page-requisites” to pull all required files for a page, even if beyond our depth limit
By default Wget will create a directory named after the domain (“0:3000”) and place all static files there. And that’s it, all the files for our static site have been generated.
Wget vs Wallflower
So if both apps can generate static sites, which one is better? If you’re working with a non-Perl site, then Wget is obviously the way to go. In terms of speed, Wget is faster if you combine the command with xargs and request the urls in parallel:
$ cat urls.txt | xargs -P16 wget -i
To take advantage of the parallel GET requests, you’ll need to serve the application on a web server though.
Wallflower has nice option (“-F”) to take a list of URLs to download, which can be useful if the entire site cannot be downloaded by following links from the root application page. App::Wallflower is the source library for Wallflower, and extendible through Perl code, so you can further tailor the process to meet your needs. This can be used for post-processing actions like generating a sitemap.xml or advanced setups like a hybrid application, where the public pages of the site are static, but the secure parts remain dynamic. With Wallflower all of this can be scripted in Perl, with Wget you’d need to a combination of shell scripts and Perl, which is less convenient.
As was recommended in last week’s article make sure you’re using absolute urls in your template code to avoid deployment issues with your static files.
Thanks to Steve Schnepp for contacting us with this tip. Thanks to Philippe Bruhat for creating Wallflower and providing additional technical guidance.
*Correction: technical comparison of Wallflower and Wget updated following clarification from module author. 2014-08-02*