Categories
tech

MPs’ Web Sites

When I set up Planet Westminster in 2006 I thought it would be a relatively simple project to maintain. Over the years, more and more MPs would start blogs. Every couple of months I’d add the new ones and everything would be great.

It hasn’t worked out like that at all. MPs’ web sites have proved to be really difficult to keep track of.

The problem is, of course, that the vast majority of MPs have absolutely no idea how web sites, blogs or web feeds work. That’s to be expected. What’s less expected is that many of them seem to get round that problem by delegating the work to people who also have no idea how web sites, blogs or web feeds work.

I’ve just done a clean-up of the feeds I’m currently monitoring. Here are some of the problems I’ve dealt with.

A few MPs (including Douglas Carswell and Caroline Lucas) changed the address of their web feed. Just changed it. No notification as fas as I can see. No attempt to redirect the old address to the new one. Just an old address returning a 404 error. Anyone who was subscribed to the old address would have just stopped getting updates. It’s almost like they don’t want people to follow what they have to say.

Ed Miliband’s web site has just ceased to exist. It now redirects you to the main Labour Party web site. Because the leader of the party obviously has no constituency responsibilities. Or something like that.

John McDonnell seems very confused. In 2007 he had a web site at john4leader.org.uk. In 2010, he was at john-­mcdonnell.­net. Both of these sites are now dead and he’s at john-mcdonnell.net. It’s like no-one has told him that you can reuse web site addresses. I wonder what he’ll do once he’s run out of variations of his name on different top-level domains.

Eric Joyce has just lost control of his domain. His ericjoyce.co.uk address currently goes to an unfinished web site campaigning for “John Smith for State Senator”. It doesn’t look as though Joyce realises this as he’s still promoting the web site on his Twitter profile.

Then there’s Rory Stewart. His web feed was returning data that my RSS parser couldn’t parse. Taking a closer look, it turned out that it was an HTML page rather than RSS or Atom. And it was an HTML page that advertised an online Canadian pharmacy pushing Cialis. Not really what an MP should be promoting.

Stuff like this happens all the time. MPs need to take more notice of this. And they need help from people who know what they are talking about. My theory (and it’s one that I’ve written about before) is that MPs’ web sites and blogs are often overcomplicated because they are developed by companies who come from a corporate IT background and who dismiss the possibility of using something free like WordPress and over-engineer something using tools that they are comfortable with. It can’t be a coincidence that many of the worst MP web sites I’ve seen serve pages with a .aspx extension (sorry – only geeks will understand that).

I’m going to repeat an offer I’ve made before. If any MP wants a blog set up for them,then I’m happy to help them or to put them in touch with someone who can help them. It needn’t be expensive. It needn’t be complex. But it can be very effective. And it will work.

Update: Eric Joyce replied to me on Twitter. He said:

Thanks. It’s being worked on and they seem to have pointed it at an obvious specimen page.

Categories
tech

RSS Failure

Oops. Busted.

Earlier this year, I wrote a mild rant about web sites who change their RSS feeds without redirecting them and thereby losing a number of readers.

Last night mou commented on that entry pointing out that I’d done something very much like that myself. For the last two months, I haven’t been publishing a new index.rdf feed.

I strongly suspect that the date of the last new version of that file coincides with the date that I installed a new version of Movable Type and reset all of the templates to the defaults. By default, current versions of MT don’t seem to publish RSS feeds. They just publish an Atom version (atom.xml).

That’s no excuse though. I knew about that problem. Previously I’d worked around it by installing an RSS template from an older version of MT. I might do that again when I have some spare time to think about it. But in the meantime I’ve taken the easiest option and created a symbolic link from atom.xml to index.rdf. Hopefully that’ll work in the short term.

Apologies to anyone who was subscribed to the RSS feed and who, no doubt, thinks that I’ve dropped off the face of the world. I’m sorry that you’ll suddenly have two months worth of my nonsense to plough through this morning.

It might be a good time to mention the other feeds that I set up recently.  There’s one contains all of my long-form writing from this and other blogs, one that has shorter items from various microblogging platforms and then there’s the feed from planet davorg which contains everything.

Categories
web

Redirecting RSS

I’ve harped on about this before, but I firmly believe that when you publish a URL on the web then it should be permanent. Of course you might want to change the way that your site is set up at some point in the future, but when you do that you should do everything you can to ensure that visitors using the old URLs are seamlessly redirected to the new URL.

And this is true of any kind of URL. It’s not just web pages. The same is true of the URLs of your web feeds. Many people who read your web feeds won’t check that they’re still reading the correct address. They’ll usually just assume that you’re still publishing the feed to the same place. Perhaps I’m not typical, but I subscribe to almost 200 feeds in Bloglines. If one of those feeds goes quiet, it could be weeks before I notice the problem and investigate what has happened.

When I was talking about the problems with the new Sun RSS feeds last year, I mentioned in passing that they had lost a lot of subscribers by just moving them to new URLs, but Martin covered it in more detail.

In the last few days, I’ve seen three instances of the same thing happening. Three places where a web feed just stopped working. Only one of them bothered to tell their users what was going one.

Firstly, I noticed that I was no longer getting updates from my MP’s web site. When I investigated further I found that they had redesigned the site and the URL for the feed had changed. Now I don’t expect my MP or his staff to understand stuff like this. But I expect they paid a lot of money to the people who redesigned the site. It would have been nice to think they were getting their money’s worth.

Secondly, this morning the BBC Doctor Who news site told me that it was moving (again, due to a redesign and change of technology). In this case they told their readers to resubscribe to the new feed, but a simple web redirection could have made it seamless. As a big Doctor Who fan, Martin has also covered this in some detail. I expect the BBC’s web department to have the experience to know that this is a really bad way to handle the move.

And finally, this afternoon I noticed that I wasn’t getting any news from BoingBoing. I only noticed this because I had submitted a story to them and was looking to see if it had been published. Like the BBC web group, the people behind BoingBoing should really know what they are doing and shouldn’t make such basic mistakes.

I think that web feeds are a great tool. They enable me to regularly read far more data from the web than I did before I used them. But it’s clear that many web site owners are publishing them because everyone else is doing it and they don’t really understand how important they are.

Update: Another one. Today (May 1st) I see that the Telegraph have moved all of their RSS feeds. At least the dropped a message about it into the old feed. But haven’t these people heard of URL redirection?

Categories
tech

Sun RSS Fixed

It finally looks like The Sun have finally fixed their RSS feeds. It’s only three months since I first noticed the problem. In that time I’ve emailed them a number of times on the subject. They haven’t bothered to reply to my mail, so I don’t know if they even read them, or if someone in their web department finally noticed the basic error that they’d made.

I say that they’ve fixed the error. All I know for sure is that they’ve corrected the URLs in the feed so you now get absolute URLs that work outside of the Sun’s web site. As Martin has pointed out, they’ve still done this in a spectacularly half-hearted manner and they have probably lost most of the RSS traffic that they had built up over the previous years.

And whilst the most pressing problems in their feeds seem to be fixed, it’s worth noting that the feeds they’re publishing now are far from perfect. Running their latest news feed through a feed validator (something that any sensible feed publisher will do) shows that there are still quite a few interesting errors. At the current rate, I expect them all to be fixed sometime around the middle of 2012.

Categories
tech

RSS in Firefox

Firefox (and, I assume, most other modern browsers) does some clever magic when viewing RSS feeds. It doesn’t show the raw XML, but instead shows a neatly formatted version of the page along with a button allowing you to subscribe to the feed in your favourite feed reader.

That, at least, is how it’s supposed to work. It doesn’t always work quite like that.

I’ve received an email about my newsfeeds page which points out that the feeds for The Times don’t seem to work correctly. If you go to a Times RSS feed (here’s their top stories feed as an example) you get the raw RSS XML instead of the standard Firefox RSS viewer.

So there’s something about the Times RSS feeds that means that Firefox doesn’t recognise it as an RSS feed. I’ve had a quick look, comparing the HTTP header that is returned to the header returned by a Guardian feed but I can’t see anything obvious. But there must be something that is controlling this behaviour.

Does anyone know how Firefox recognises an RSS feed? Or is there anyone from The Times reading who would like to investigate why their feeds aren’t working as expected?

Update: As pointed out in the comments, the problem is (rather obviously) that the Times feeds are being served with the incorrect Content-Type. There’s a whole can of worms about what the correct Content-Type should be, but changing it to text/xml should tell Firefox to do the right thing. I’ve emailed the Times pointing out the issue. Let’s hope they’re more on the ball than the Sun’s web team.

Update: Here we go again. The Times top stories RSS feed contains the following information:

<webMaster>support@timesonline.co.uk</webMaster>

But the mail I sent to that address bounced back as undeliverable.

<support@timesonline.co.uk>:
143.252.81.140 does not like recipient.
Remote host said: 550 Mailbox unavailable or access denied – <support@timesonline.co.uk>
Giving up on 143.252.81.140.

What’s the point of advertising an undeliverable address?

Categories
tech

Sun RSS – Still Broken

Two weeks ago I wrote about how the new, “improved” RSS feeds from The Sun are, in fact, completely broken. In that two weeks nothing has changed and the feeds are still broken.

Much as I enjoy seeing the Sun web team making a fool of themselves like this, I would actually like it even more if they would acknowledge their mistake and fix the problem. To that end, I’ve emailed them about it three times in the last two weeks. But all my mail seems to have been ignored. I’ve had no response and the problems haven’t been fixed.

I’m pretty sure that the most critical problem (the broken URLs that are included in the feeds) is something that could be fixed in two minutes by someone who has access to the right template files on the Sun’s web server.

So, given that I can’t get any response from the contact address advertised on their web site, does anyone have any other suggestions? Do any of you know anyone who works in the Sun’s web team or do you have any other avenues that I could try.

Or shall we just all sit back and laugh at the Sun for getting it so wrong?

Categories
media

Newspaper RSS Feeds Updated

As I mentioned yesterday, I’ve made some fixes to my UK newspaper RSS feeds page. The fixes include

  • Adding Daily Star links
  • Fixing the Sun links (tho’ as I said yesterday, the Sun RSS feeds are still completely broken)
  • Some tweaks to the Times and GU parsers

The Independent section is currently broken. They have re-organised their RSS links page into a hierarchy and I need to put a bit more work into parsing it. Hopefully I’ll do that tomorrow.

I also need to revisit all of the other papers’ sections to ensure that all of the feeds are being extracted.

When I started this project a couple of years ago, each paper published a handful of RSS feeds. And only the broadsheets even bothered. Now the tabloids are in on the act too and with typical tabloid fervour they have gone completely over the top and are publishing huge numbers of feeds. It’s clear to me that my single page format is only barely manageable at this point and I need to rethink how this site is going to work.

But anyway, today’s version is an improvement on yesterdays. Hope you find it useful.

Categories
tech

Sun RSS Feeds Broken

It’s been a while since the “sensational soaraway” Sun started publishing RSS feeds of their stories. I’m subscribed to a couple of their feeds (it keeps my blood pressure up) but I noticed a couple of hours ago that the feeds I was subscribed to no longer exist.

It seems that at some point in the last few weeks they have completely revamped all of their feeds. The details of the new feeds are on their site. Unfortunately the new feeds have been designed by someone who apparently knows very little about how RSS is supposed to work. These best example is that the links within the feeds are all relative instead of absolute – by which I mean that they don’t include the server address. For example, one story in the current news feed contains the URL:

  • /sol/homepage/sport/article420662.ece

where it should be the full URL

  • http://www.thesun.co.uk/sol/homepage/sport/article420662.ece

Relative links only work for links within the same site. RSS feeds are (almost by definition) supposed to be displayed on other sites and therefore relative links won’t work.

Having discovered this, I decided to check the feed with the online web feed validation tool (something that the developers really should have done for themselves) only to find that they really haven’t done very well at all.

Earlier this week, Martin pointed out that the Daily Star have also started to publish RSS feeds, so I was planning to do some work on my newspaper feeds page this week. Looks like I’ll have to do some work on the Sun section of that site as well.

Update: I was just looking at Martin’s post about the top 100 UK newspaper web feeds and I noticed that the most popular Sun feed (their news feed) had 12,000 subscribers (and that’s just in Google Reader). The figures are for the old feed. As the old Sun feed now just returns a 404 error, the Sun have just potentially lost 12,000 readers. RSS feed addresses are as important as any other URL on you web site. It should be as permanent as you can possibly make it. If you change feed URLs for some reason then you should put redirections in place so that your old readers can still find you.

This change gives every impression of being carried out by a complete amateur. I hope the Sun didn’t pay too much for it.

Categories
tech

Breaking (And Then Fixing) Planets

I’ll write more about Hack Day over the next few days. But I should point out that most of yesterday was spent updating the version of Plagger on this web server. This had the unfortunate side-effect of breaking all of the Plagger-run planets on this server. So today was largely spent fixing them again.

Everything should just about be how it was. In fact, in many cases, the newer version seems to be a vast improvement over the older version that I was running previously.

There have, however, been a few small changes in the names of some of the files that are generated. The RSS and Atom feeds have been renamed, as have the OPML files. And each planet also has now has an associated FOAF file.

So here are the new URLs.

If you’re using any of those links, then you should probably update them. I’ll put some redirections in place tomorrow but, for now, I’m tired and I’m going to bed.

Categories
tech

The Futility of Screen-Scraping

Martin’s article today about the Daily Express web site reminded me that it’s been some months since I looked at my list of Newspaper RSS feeds. As the list is created by screen-scraping the individual papers’ web sites, it’s no surprise that it all goes out of date as the sites are redesigned and updated.

And sure enough, it was a real mess. When I ran the program that generates the pages, about half of them were broken. But it wasn’t too serious, and after half an hour or so of tinkering with regular expressions, it all seems to be working again.

But all in all, it’s a good lesson in why screen-scraping is a really bad idea. This would be far easier (in fact it would pretty much be unnecessary) if the papers took the next step and released OPML files of their feeds, rather than free-form web pages.

Anyway, it’s all back again now. Please take a look and let me know if I’m missing anything obvious.