Following on from Monday’s post, I thought it might be interesting to go into a bit more detail about why I won’t ever use Microsoft Word (or, indeed, any word processor) unless I have to. I have three reasons – one is personal experience, one is technical and the third is largely philosophical.
Firstly, the personal experience. I consider (or, more accurately, considered) myself a Word power-user. Many years ago I used to write Word macros to automate document creation. I know what I’m doing with Word. So when I wrote my first book, six or so years ago, Word was my first choice as the tool to use. I was using Word 97 and some formatting templates that had been supplied by my publishers. I remember that I had read some stories about problems that people had experienced writing a whole book in one Word document so I followed what seemed to be considered best practice and created a separate document for each chapter. Now this is all some time ago so my memory is a bit hazy, but I distinctly remember often having to spend time reapplying formatting that got scrambled when I opened a chapter. Now that could have been the version of Word, a problem with the templates, or some other technical glitch. All I know is that it left a bad taste in my mouth and over the months that I spent writing the book I went from being a Word fan to hating it with a passion.
My second reason is more technical. Any word processor will store your document in a proprietary binary file format. If you were to open a Word document in a text editor like Notepad then you wouldn’t be able make much sense of what you see. That’s because all of the formatting information is stored in a manner that only a word processing program can understand. One the other hand, Unix (and therefore Linux) has a long tradition of dealing with plain text files. The Unix tool set has a large number of interoperable tools which can be used to manipulate text files in various ways. For example, it’s simple to use “find” and “grep” to recursively search a directory and all subdirectories to find all of the files that contain a particular phrase. Another good example is getting a word count for a set of documents. With Word you would need to open each file individually, get the word count and add the numbers manually to get a total. With Unix tools, it’s a simple process to get the word count for each individual file and the total across all the files. It’s probably just what I’m used to, but I find it far easier to deal with plain text files.
My final reason is, as I said above, more philosophical. I don’t think that WYSIWYG tools are a good way to produce documents. Think about it. How often do you spend almost as much time fiddling with the formatting of a Word document as you do actually writing? A WYSIWYG program encourages you to see the presentation of your document as intrinsically linked to the content. We used to see web publishing the same way – the presentation of a web page (lots of <font> tags and too many nested tables) were completely intertwined with the actual content making it hard to change one without changing the other. Now, of course, we laugh at the old days as we all produce semantically meaningful markup which will be formatted using an external stylesheet. And it should be the same with documents. Write what you have to write, only pausing to add extra information to define the various parts of the document (this is the title, this is a subsection header, this is a bullet list, and so on). Once you’ve created the document that way, you can start to think about how it should look and apply styles appropriately. I realise that Word can be used that way (the default document styles allow you to define the various parts of your document) but I don’t think that a WYSIWYG program encourages you to think about your writing that way – the presentation always gets in the way.
I’m not saying anything new or radical here. People have been producing documents this way for years (ask your neighbourhood Unix geek about LaTeX). It’s just a shame that the most popular end-user tools for document creation don’t encourage this mode of working.
So that’s why I prefer to work in a plain text format (or something that is, at least, stored as text like POD or DocBook) and why I’ll never use a word processor unless it’s something that a client insists on for some reason.
 And yes, I realise that text is (strictly speaking) another binary format. The point is that it is a simple and well-understood format. Of course Unicode encodings complicate that somewhat.