ActiveRecord Does It Wrong

I’m listening to the keynotes on the final day of EuroOSCON and David Heinemeier Hansson is talking about the secrets of Ruby on Rails. He talks about how they promote “convention over configuration” and how that means that you don’t have to describe the same object attribute multiple times. This is a great idea, but one of the key parts of Rails (ActiveRecord, the part that talks to your database) forces developers to break this rule. When defining objects in ActiveRecord you need to describe all of the relationships that the object has with other objects (tables) in the database. This is repeating information that should be stored in the database. All databases (well, all databases that you should consider for serious development) allow you to define these relationships in the database metadata. Why doesn’t ActiveRecord pull back this information from there? I suspect that it might be because most Rails users will be using MySQL as their database. And MySQL doesn’t exactly make it easy to extract this data.

I’ve written about this before. This post is really just here as a pointer so that new readers can find my previous articles.

5 comments

  1. From my limited experience, there are two problems with automagically finding out foreign keys:

    1. SGBDs differ wildly in how they allow you to query them for metadata.
    2. You can’t tell the difference between an has_one and a has_many relationship by just looking at table descriptions.

    The first one is only a matter of implementation.
    The second one would require either some arcane SQL or some hint in the model’s class — and then you might aswell describe the whole relationship in the model.

  2. Michel,

    Thanks for your comments.

    As you say, the first problem is simply a matter of implementation. The ORM is going to need to have pluggable modules for dealing with different database systems. This is just one more thing that will need to go into those modules. In many cases, the code might already exist. For example, in Perl all database access goes through the DBI module and underneath the DBI module you have a DBD (database driver module) which is specific to the database that you are using. Recent versions of the DBI specification include a subroutine called ‘foreign_key_info’ which the DBD modules should implement. Given a table name, this function will return details of the foreign keys. Any ORM built in Perl can therefore simply reuse this functionality.

    I think that your second problem is wrong. If you have a foreign key that is also marked as only containing unique values then it must represent a one to one relationship. In fact if your database system supports it, you should probably think about setting up your tables so that the foreign key in the dependent table is also its primary key. That way the two tables will share the same primary key.

  3. Please tell me how to implement the following associations using database introspection:

    While database introspection in some aspects can handle the superficial cases, it can’t handle no where near the bulk of the real-life cases.

    This means that you’d have to investigate multiple sources to identify all the collections available. So if you want to be DRY in this aspect, you can do as many Rails developers already do: Don’t respecify this information in the database (that is, don’t use foreign key constraints or constraints of any kind). All the 37signals applications behave like this.

    So whether or not to be DRY in this case is fully determined by your own actions.

  4. David,

    You’re talking Ruby which is a language that I’m only just starting to learn so I’m not sure that I understand exactly what you mean, but here’s my attempt anyway.

    You have three tables called project, person, and milestone. The project table has a non-null column called leader_id which is a foreign key that links back to the person table (a project always has exactly one leader, but a person can lead more than one project). The milestone table has a non-null column called project_id which is a foreign key to the project table (a milestone belongs to exactly one project but a project has many milestones).

    Assuming that your latest_milestone attribute is the latest completed milestone and that the milestone table has a date_completed column, then there’s no need for a latest_milestone attribute as you can always get that information without it and adding it would denormalise the data model. However, it’s possible that you might want to deliberately denormalise the database by adding the latest_milestone column and I assume that is the situation that you are suggesting. In that case I would add the latest_milestone column to the project table as a foreign key to the milestone table and I would have this column updated in the update trigger on the milestone table. In my opinion using triggers like this is the only sensible way to allow denormalised data in your database.

    So in this case I don’t agree that database introspection isn’t up to the job. I do, however, agree that there may be some (rare) cases where you can’t store all of the required metadata in the database. In those cases, you will need to put some of the the metadata in the ORM but that should be your last resort and you shouldn’t use it as an excuse to move all of the metadata out of the database. As I’ve written previously, I think that’s a terrible idea. The metadata should always be stored at the lowest level possible – i.e. in the database whenever possible. If your database system of choice prevents you from doing that then you need to get a better database system.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.