Martin’s article today about the Daily Express web site reminded me that it’s been some months since I looked at my list of Newspaper RSS feeds. As the list is created by screen-scraping the individual papers’ web sites, it’s no surprise that it all goes out of date as the sites are redesigned and updated.
And sure enough, it was a real mess. When I ran the program that generates the pages, about half of them were broken. But it wasn’t too serious, and after half an hour or so of tinkering with regular expressions, it all seems to be working again.
But all in all, it’s a good lesson in why screen-scraping is a really bad idea. This would be far easier (in fact it would pretty much be unnecessary) if the papers took the next step and released OPML files of their feeds, rather than free-form web pages.
Anyway, it’s all back again now. Please take a look and let me know if I’m missing anything obvious.