Checking Copyright

There’s a lot of material out there on the internet. And the nature of the internet means that it’s easy to reuse that material without paying any attention to copyright. If my browser can display an image, then I can save that image to my local disk and then, perhaps, use it on my own web site or in some other publication.

But just because it’s easy from a practical perspective, that doesn’t mean that it’s legal to do it. Much of the material on the web is subject to various copyright restrictions. And if you’re going to be a responsible internet citizen then you’re going to ensure that you are careful not to use any material in ways that are contrary to the copyright.

If you are, say, a national newspaper then you’re going to want to be really sure that you’re being careful about copyright. I’m sure that someone like (to pick a paper at random) the Daily Mail would get very upset if they found someone using one of their photos without permission or without giving correct attribution. It’s therefore reasonable to expect them to offer the same courtesy to others.

Take a look at this story about Philip Schofield and Twitter. Don’t bother to read it. It’s the usual Mail nonsense. They’re complaining that Schofield shares too many details of his life on Twitter. But they do it (ironically, I’m sure) by poring over every detail of a meal in the Fat Duck. No, don’t read the words. Take a look at the pictures. Schofield has illustrated his evening by posting photos to TwitPic. TwitPic is a Twitter “add-on” that allows you to share photos as easily as Twitter allows you share text.

Notice that the Mail have put a copyright attribution on each of Schofield’s photos. They all say “© Twitpic”, implying that that TwitPic own the copyright on the photos. But if you take a few seconds to read TwitPic’s terms and conditions, you find that they say:

All images uploaded are copyright © their respective owners

TwitPic lay no claim at all to copyright on the pictures, so the Daily Mail are attributing copyright to the wrong people. It’s not at all hard to find this out (it’s a link labelled “terms” at the bottom of the page – exactly the same, in fact, as it is on the Mail site), but the lazy Daily Mail picture editor couldn’t be bothered to do that and just guessed at the copyright situation.

And whilst we’re talking about the Mail not understanding copyright, it’s worth remidning ourselves of the nonsense in their terms and conditions.

  • 3.2. You agree not to:
  • 3.2.1. use any part of the materials on this Site for commercial
    purposes without obtaining a licence to do so from us or our licensors;
  • 3.2.2. copy, reproduce, distribute, republish, download, display,
    post or transmit in any form or by any means any content of this Site,
    except as permitted above;
  • 3.2.3. provide a link to this Site from any other website without obtaining our prior written consent.

Under clause 3.2.3, I’ve broken their terms at least twice in this article. But clause 3.2.2 is the really interesting one. You’re not allowed to download or display the content of the site. Which makes it rather hard to view it in a browser. Idiots.

Update: They have now changed the copyright on the photos to “© Philip Schofield/Twitter”. So that’s one less piece of stupidity in the world. The struggle continues.


Redirecting RSS

I’ve harped on about this before, but I firmly believe that when you publish a URL on the web then it should be permanent. Of course you might want to change the way that your site is set up at some point in the future, but when you do that you should do everything you can to ensure that visitors using the old URLs are seamlessly redirected to the new URL.

And this is true of any kind of URL. It’s not just web pages. The same is true of the URLs of your web feeds. Many people who read your web feeds won’t check that they’re still reading the correct address. They’ll usually just assume that you’re still publishing the feed to the same place. Perhaps I’m not typical, but I subscribe to almost 200 feeds in Bloglines. If one of those feeds goes quiet, it could be weeks before I notice the problem and investigate what has happened.

When I was talking about the problems with the new Sun RSS feeds last year, I mentioned in passing that they had lost a lot of subscribers by just moving them to new URLs, but Martin covered it in more detail.

In the last few days, I’ve seen three instances of the same thing happening. Three places where a web feed just stopped working. Only one of them bothered to tell their users what was going one.

Firstly, I noticed that I was no longer getting updates from my MP’s web site. When I investigated further I found that they had redesigned the site and the URL for the feed had changed. Now I don’t expect my MP or his staff to understand stuff like this. But I expect they paid a lot of money to the people who redesigned the site. It would have been nice to think they were getting their money’s worth.

Secondly, this morning the BBC Doctor Who news site told me that it was moving (again, due to a redesign and change of technology). In this case they told their readers to resubscribe to the new feed, but a simple web redirection could have made it seamless. As a big Doctor Who fan, Martin has also covered this in some detail. I expect the BBC’s web department to have the experience to know that this is a really bad way to handle the move.

And finally, this afternoon I noticed that I wasn’t getting any news from BoingBoing. I only noticed this because I had submitted a story to them and was looking to see if it had been published. Like the BBC web group, the people behind BoingBoing should really know what they are doing and shouldn’t make such basic mistakes.

I think that web feeds are a great tool. They enable me to regularly read far more data from the web than I did before I used them. But it’s clear that many web site owners are publishing them because everyone else is doing it and they don’t really understand how important they are.

Update: Another one. Today (May 1st) I see that the Telegraph have moved all of their RSS feeds. At least the dropped a message about it into the old feed. But haven’t these people heard of URL redirection?


Credit Where Credit Is Due

I spend a lot of time here complaining about broken web sites, so it’s nice to be able to praise something that worked better than expected. And I’m slightly surprised to be able to report an impressive experience with a UK government web site.

One thing that I found whilst sorting through my study over the the weekend was my driving licence. It’s a provisional licence. I’ve never passed a driving test. I got a provisional licence when I was seventeen and over the next year I took many lessons and failed three tests. Back then (this was the late 70s) provisional licences were only valid for a year so once I gave up learning to drive I let my licence lapse and thought nothing more about it.

But then in 1996 I thought perhaps I would have another go so I applied for another provisional licence. By the time the licence arrived I’d lost any enthusiasm that I had and the new licence was just filed away and forgotten about. One thing had, however, changed in the intervening period. Provisional licences were no longer valid for just one year. This one was valid until (I think) my 70th birthday.

So on Saturday I found this long-forgotten, but still valid licence. The first thing I noticed about it was that it was still registered to my last address (it’s been ten years since we moved). The second thing that I noticed was a threat of a £1000 fine for failing to inform them of a change of address. Of course that really means that you’ll be fined if you get caught driving with a licence that has an out of date address, so there’s not much chance of me ever being fined. But I decided that it was worth getting it updated and put it aside in a small but growing pile of things to be addressed later.

Late on Sunday I was going through that pile and came to the licence. The instructions were to fill in your new address on the back of the licence and to send it back to the DVLA who would then issue a corrected replacement. Before doing that I decided to check if I could do this online.

I found the DVLA web site which quickly lead me to the Driving Licensing Online site where I found the link that I was looking for. As I was going through the process I realised that there might be a problem. My licence was of a pretty ancient vintage and new licences have a photo on them. I could see disaster looming. I was sure that I was going to end up with a form to print off and send in along with a photo. But that’s not what happened. What happened was a lot cleverer than that.

The system realised that I was a registered driver (albeit a provisional driver) and that it didn’t have a photo of me. It then asked if I had a passport and when I said yes, it offered to use the photo from my passport on my new licence. Not only does this demonstrate a level of technical ability and standardisation that is rarely seen in organisations of this size, but (far more importantly in my opinion) by asking for my permission before doing this, it shows an awareness of privacy issues that is, in my experience, even rarer.

I assume that had I said no, then I would have still ended up with a form to print off and instructions to send it in with a photo. But because I was happy for them to link these two records, I was able to do it all online. And it was free too. I half expected them to try and charge me fifteen of twenty quid.

So I’m now expecting my new driving licence to arrive in the next few days. I don’t know whether or not I’ll actually use it to start learning to drive again, but it’ll be a useful piece of ID to carry around. All in all, I was very happy with the way it all worked out.


Google Calendar Spam

Is anyone else getting Google Calendar spam? About half a dozen times in the last month I’ve got an SMS message telling me that I’ve received an invitation to an event on my Google Calendar and when I check the calendar it’s actually some kind of 419 spam.

I suppose that it was inevitable that the spammers would eventually find this new way of annoying people, but I’m not happy that it’s apparently so easy for them. Is this going to go the way of email and blog comments, with people becoming reluctant to accept event invitations from random people? Will you need to join some kind of whitelist before you can invite me to an event. I really hope it doesn’t come to that.

I’ve been just deleting the invitations, but it’s times like these that I wish there was some way to transmit a poke in the eye over HTTP. Is there something else that I can be doing? Should I report them to Google in some way? Would that help at all? I assume the invitations are being sent from disposable accounts.

I’ve only been using Google Calendar for a month or two. It would be a shame to see it become unusable. I’d love to hear any suggestions you have (as, I’m sure, would Google).


Human Dinosaurs

Having just been saying how much I like the new Guardian URL scheme, it was interesting to see the URL for this article from today’s paper. The article is about some early hominan[1] remains that have been found in northern Spain. The URL is

I can obviously see why it’s in the science section. And of course it’s about archaeology. But “dinosaurs”? What connection do hominina have with dinosaurs? They are separated in time by about sixty million years. URLs like these only work if the person assigning them has an understanding of the subject area.

And, of course, it’s too late to correct it now as URLs are permanent :-)

[1] I originally put “hominid” there believing it to be the correct word. But according the Wikipedia, the definition of hominid has gradually changed to encompass all the great apes. Humans and their closely related species are now apparently described as hominina. That’s something new I’ve learned today.


Guardian URLs

I’m a great believer in the idea that URLs should be permanent. When I publish something on the web then (hopefully) people link to it, and it would be nice to think that those links still work in five, ten or fifty years time. A few months ago I changed the URL scheme for davblog, but I ensured that the old-style URLs would redirect to the new ones.

Of course, this is a relatively small site. It has a couple of thousand entries. My fix to ensure that the old URLs still worked simply consisted of a few pages of Apache RedirectPermanent directives. If you’re dealing with a site that is larger and more important than this one, then the problems become far harder.

So it was nice to see Simon’s post pointing out that the Guardian had taken this problem seriously and had put some work into making sure that their old URLs still work correctly now they are in the process of switching to a new URL scheme. As an example, he links to an old blog entry which contains a link to,2763,1382899,00.html

No prizes for guessing which CMS generated that nasty URL. Clicking on that URL now redirects you to the (far saner)

And all is well with the world.

Well, almost. Digging around on some old (and rather embarrassing) web sites that I haven’t got round to taking down yet (because URLs are permanent!) I find this page (love that 1997 web design) which contains a number of links to Guardian web pages. Here’s an example:

Clicking on that page leads to a shiny new “URL not found” page.

Which, I think, demonstrates a couple of interesting things. Firstly, at some point when the Guardian were moving from one CMS to another the permanence of the URLs wasn’t considered a high priority. There is no chain of redirection in place which converts this old URL to a newer style one. It looks like when the Guardian moved from this URL scheme, they broke all incoming links to their site. I wonder if that problem was even considered ten years ago.

Secondly, look at that really old URL. It’s not perfect by a long way but, to me, it looks a lot easier to understand than the first URL example above (the one generated by the CMS they are currently moving away from). There’s one “magic number” in it – 29440 is probably the article ID in some database – but you can work out the date that the article was published (24th July 1997) and the section it was in (Politics News). The other URL tells you that it points to a religious story, but those four numbers at the end make most of the URL completely meaningless.

Working out a good URL scheme isn’t a trivial task. That’s particularly true for a complex site like the Guardian. I’m really glad to see that they are making great progress in this area. But it’s interesting to see that at some point in the history of their site their URL scheme took what seems to be a big step backwards. Presumably, switching to the CMS which produced those nasty URLs was seen a giving them many other advantages that outweighed the URL damage.

I wonder if there’s anyone around who remembers this change.

Update: Searching the Guardian site finds only one article that was published on July 24th 1997. And that doesn’t look at all like the one that I was trying to link to, which was apparently about student fees. So it appears that not only are the links broken, but that some of the content from that era is no longer available on the site.

Oh, and thanks to Robin for adding his comments. I was hoping that someone like him might drop by and chip in.



It’s nine years since I registered the domain and set up a web site there. And I’ve never really known what to do with it. Since I started blogging, it’s seemed even less useful. The blog front page was where all the interesting stuff happened. The main page just contained links to a few bad jokes and a couple of useful sub-sites. For years I just tinkered with the design a bit, but I was never really happy with it. Sometime early in 2005 I rewrote it so that it took a lot of its content from various RSS feeds that I published. But the code to do that was a really nasty hack which I’ve wanted to rewrite since the day I first wrote it.

A few weeks ago, I wrote Perlanet which is a simple program for aggregating web feeds and republishing the results. As I had some spare time yesterday, I rewrote the front page using Perlanet to do most of the heavy lifting. It now contains the full text of the most recent entries from my various blogs, together with examples of my latest flickr uploads and list of recent twitters and delicious links. It’ll be simple to add other feeds to the mix in the future.

I realise that this isn’t exactly new. People have had sites like this for years. But I’m happy at how quickly I managed to build this and happier that it shows that Perlanet is as flexible as I wanted it to me. I’m also pretty happy with the way that it looks (although that is, I suspect, more to do with the Boilerplate CSS framework than my design skills).

I’ve also started to publish a number of Atom feeds. As you’ll see from the top right of the new page, there is one feed containing the blog entries, one containing the shorter stuff, one for photos (that’s just the original flickr feed but it might be expanded in the future) and one that contains everything (that’s the planet davorg feed). That allows readers a bit more flexibility over what content they subscribe to.

Oh, and I’ve also taken the opportunity to remove the links to all the old jokes. The pages are still there if you know where to look, but Google Analytics tells me that they won’t be missed.


Google Sees All

An interesting story in today’s Telegraph. Apparently the photo that proved that John and Anne Darwin were together in Panama was found by someone searching for “John Anne Panama” in Google.

I’ve just tried it and it still works. Searching for “John Anne Panama” in Google image search brings back a picture of them from the “Our Customers” page on

The picture has been removed from the page now and even the direct link no longer works. But it will stay in Google’s cache for a while.

Here’s a handy tip. If you want to pretend you’ve died so that your wife can claim your life insurance and you can both start a new life in Central America, then it’s a really bad idea to allow a picture of you both to be put on a web site. The world wide web is international, you know. The clue is in the name


Joined Up Web Sites

Spread Shirt are company that allow you to design and print your own t-shirts. They also allow people to set up online shops selling t-shirts that they have designed. There are two Spreadshirt sites, one for the US and one for Europe.

A couple of months ago I saw a shirt in a Spread Shirt shop that I wanted to buy. Unfortunately, it was in the US store, which means it would be shipped from the US. And that would cost about $10. Shipping from the European site is £2.20. I decided I didn’t want the shirt that much and forgot about it.

Yesterday I decided to dig a bit further. I couldn’t really understand why I couldn’t order the shirt from the US but have it printed in and shipped from Europe. Isn’t that the advantage of print-on-demand shirts? They can be printed anywhere. So I emailed Spread Shirt customer support (in the US) to find out how I could do that.

I got a reply in a few hours. They explained that the two Spread Shirt sites were run completely seperately and that the two systems weren’t joined up in a way that allowed the process that I was suggesting. It wasn’t the most helpful of replies. I could have guessed what they were going to say.

But at the bottom of the reply was an invitation to fill in a survey rating the help I had got from customer service. And as an incentive to fill in the survey they promised a coupon worth $5 in any Spread Shirt (US) shop.

So I filled in the survey and then ordered the shirt using the coupon, which effectively dropped the cost of shipping to $5 – close enough to the European cost to make it not work worrying about the difference.

I still think that the differentiation between their two systems is crazy. If you’re an international company, then you should really try and act like one. But it’s nice that they give you a backdoor method to get round the problem.

usability web

Getting Web Sites Wrong

It’s a constant refrain round these parts, I know, but here’s another example of a web site that has a couple of nasty errors that could have been avoided with a little thought.

The site in question is Knight Frank the estate agents. I was drawn to their site as they are selling a house on my road. The house is a little bigger than ours and I wanted to know what it was selling for. Yes, I’m using the web to feed my middle-class obsession with house prices.

Anyway, the search functionality on the site was easy enough to use and I quickly found the property that I was looking for. So I clicked the link in the search results to see the details.

That’s when I noticed the first problem. When you have a details of objects (in this case houses) on your web site, then it makes a lot of sense to give each object a unique web address so that it’s easy for people to pass details of a given object to their friends. As we’ll see later, Knight Frank’s site does have unique addresses for each property, but they do their best to keep them hidden.

When I was looking at the list of search results, the location bar in my browser said:

And when I clicked through to the property details, it said:

Nothing changed. There was no address that I could have used to pass on to a friend. To my mind, that’s a fundamental misunderstanding of how the web works.

However, looking at the details of the property, I saw a “send to a friend” link. I realised that if you can send a link to the property to a friend then the email that is sent must contain the unique link that the web site hides from you. I decided to send the details to myself.

And here’s the next problem. The “send to a friend” page asks for your email address, your friends email address and a message to include in the email. The web site then sends an email containing the message and the link from you to your friend.

Can you see the problem?

The problem is that this mail claims to come from you. But it doesn’t really. It really comes from the Knight Frank web server. A common spam technique is to send mail that doesn’t originate from the site that it claims to come from. For this reason, a number of people have implemented a system called SPF. In SPF a domain publishes a list of mail servers that are allowed to send mail for that domain. At the other end of a transaction, when a mail server receives a mail, it can check against these published lists to ensure that the mail comes from a mail server that is allowed to send mail for the domain that it claims to come from. Any mail that doesn’t match these requirements can be discarded as spam. I publish a set of SPF records for and I also check SPF records for any incoming mail and discard any that don’t match.

So we have the situation where the Knight Frank site is trying to send mail that claims to come from, but that server isn’t on the list of servers that send genuine And that means that my mail to myself is rejected by my incoming mail server as spam. Luckily I got a bounce message that contained the original message so I finally managed to work out what the address of the property details is. If the Knight Frank web site had been honest and sent the mail from itself, then the mail would have got through without any problems.

This system probably works for them currently because SPF isn’t really widely implemented. But as spam gets worse then it will become far more common. And less and less Knight Frank web site mail will get through to the intended recipient.

The Knight Frank web site designers probably thought they were being really clever. They probably think that hiding the details of the web site address makes things look simpler. They almost certainly think that people are more likely to read mail that comes from a friend. But I think that in both of these cases they are failing to understand how the internet should work.

If you’re interested, the property details are here. It’s on sale for £795,000 and over the weekend the “for sale” sign changed to an “under offer” sign.