Over the weekend I found time to rebuild the rest of my missing planets. I’ve resurrected Planet Balham (Atom), Planet Westminster (Atom) and Planet Doctor Who (Atom). They all have Atom feeds available as well.
This has been an interesting test of Perlanet (my simple planet-building program). When building planet davorg, I was only using feeds that I had some kind of control over. It was therefore pretty simple to ensure that the web page created was valid HTML (though, due to some bugs in the Perl modules I’m using, the same can’t be said of the Atom feed). But with these new planets, I’m aggregating feeds from all sorts of places and am seeing problems that I hadn’t seen before. In particular I’ve changed Perlanet to deal with the cases where the feed can’t be downloaded for some reason (I think that some of the MPs on my list have stopped blogging) and where the feed isn’t valid.
There are also plenty of examples of feeds that have some pretty mad HTML in them which are breaking the layout of the output pages. On Planet Balham there seems to be some broken HTML that is badly effecting the <div>s on the page, moving the Google Adsense block halfway down the page. Also, the second half of the page is currently in italics due, I suspect, to an unclosed <i> tag. On Planet Westminster there’s also some kind of problem which means that the names of the feeds change size halfway down the page.
So it’s clear that I need to add something to clean up the feeds. I’ll probably look at using HTML::Tidy or HTML::Scrubber (perhaps both). Expect some better looking pages in the next few days.