Script for turning messy texts into well-structured, -outlined and -formatted Word documents 16.6.10
Some interesting pieces of software have been developed in recent years that aim at replacing the venerable Word as an authoring tool for large and complex writing projects. On the Mac side, two humbly named applications, Ulysses and Scrivener, have most notably emerged as popular writing tools. While everything is nice and fine as long as you write, sharing your output and delivering well-structured (in a technical sense) and formatted documents is a bit cumbersome and usually requires dreary manual intervention. As I had written a script for Word for Windows back in my, well, teens that did just some of that things I until now had to do manually on the Mac, it should be fairly easy to update and extend that thing and write some code.
It turned out that scripting rich formatted documents on the Mac is a bit more tricky that I would have preferred it. Anyhow, its done now. The purpose of the script is to turn a text document with in-text footnotes, in-text comments, distinct rich-text formatting for headings at distinct outline levels into a nicely formatted document, which uses in-built footnotes, comments, styles and ToC features.
For now, the script does the following:
- Space/new line doublets are replaced by single space/new line.
- Outline levels of all paragraphs are set to 0, which means: no more garbage in Word’s Document Map.
- Text with certain formatting is assigned to paragraph styles “Heading 3”, “Heading 2” or “Heading 3”
- In-text comments, i.e. text like “[AN: this is an in-text comment]”, are replaced by Word’s colourful comment bubble
- In-text footnotes, i.e. text like “[FN: this is an in-text footnote]”, are replaced by a real footnotes
- A table of content is created at a position marked by a certain string.
For those interested in too much technical background information: Scrivener’s RTF export is somewhat insufficient for academic writing (cf. discussions in their forum), Scrivener’s support for Multi Markdown is weak at exporting footnotes, comments and styles support. Pages (part of Apple iWork 09) has an insufficient API, which provides no access to footnotes and comments. Adobe InDesign CS4 likewise doesn’t provide APIs for comments, neither does Nisus Writer Pro. Microsoft killed VBA with Word 2008, but will be back later this year with Office for Mac 2011. So I considered reusing my 1990s VBA code by using Word 2003 on Windows, using Parallels. Turns out albeit that my aging, crash-happy Macbook doesn’t like running Parallels 5. So, back to the Mac and Word 2008 using Applescript. Turns out though – surprise, surprise – that APIs for Word for Mac slightly, but critically differ from Word for Windows.