explain home explain home explain home explain homeexplain banner
Explanations - about explain | XML training | Consulting | XML resources | Publications | Open Source Software

DocBook Roundtripping

Explain's Steve Ball, in conjunction with Bob Stayton, has developed a set of XSL stylesheets that convert Microsoft's WordML into DocBook and back again. These stylesheets are intended to allow "roundtripping" of DocBook documents, ie. to convert DocBook documents into Word and back in DocBook with no loss of data and structure. The aim of this is to allow a word processor to be used to edit DocBook XML documents.

All of the XSL stylesheets are part of the DocBook XSL project.

More than one word processor application is supported. At present there is support for Microsoft Office 2003 and Apple Pages. In order to use MS Word it is necessary to save documents in XML format (ie. using WordML). To use Apple Pages, it is necessary to copy the index.xml.gz file from the document bundle and uncompress it.

Using the XSL Stylesheets

There are two sets of XSL stylesheets; one set for converting DocBook into Word (WordML) or Pages, and another set for converting WordML or Pages to DocBook.

DocBook to WordML

The XSL stylesheet dbk2wordml.xsl is used to transform DocBook documents into WordML documents. An additional file, the document template, is needed to provide definitions of style formatting properties. This is specified as the wordml.template stylesheet parameter.

Example usage:

xsltproc -o my-word.xml --stringparam wordml.template template.xml dbk2wordml.xsl my-docbook.xml

The document template is a WordML document, but its body text is not used. This allows the user to change the formatting properties used by the various styles by simply using Word's menus and dialogs.

WordML to DocBook

Transforming a WordML document into DocBook involves "chaining" the XML document through a "pipeline" of XSL stylesheets. There are four XSL stylesheets involved: wordml2normalise.xsl, normalise2sections.xsl, sections2blocks.xsl and blocks2dbk.xsl.

Example usage:

xsltproc -o my-word.norm wordml2normalise.xsl my-word.xml
xsltproc -o my-word.sects normalise2sections.xsl my-word.norm
xsltproc -o my-word.blks sections2blocks.xsl my-word.sects
xsltproc -o my-docbook.xml blocks2dbk.xsl my-word.blks

DocBook to Pages

The XSL stylesheet dbk2pages.xsl is used to transform DocBook documents into Pages XML index documents. An additional file, the document template, is needed to provide definitions of style formatting properties. This is specified as the pages.template stylesheet parameter.

Example usage:

xsltproc -o index.xml --stringparam pages.template template-pages.xml dbk2pages.xsl my-docbook.xml

The result document, index.xml must be placed in a bundle in order to be opened by the Pages application. The simplest way to do this is to create a directory that has ".pages" as its file extension and then copy/move the index.xml into that directory. The index file does not need to be compressed before opening with Pages.

The document template is a Pages index document, but its body text is not used. This allows the user to change the formatting properties used by the various styles by simply using Pages's menus and dialogs.

Pages to DocBook

A Pages document is actually a "bundle", ie. although it appears as a single icon it is really a directory that contains all of the files needed for the document. Control-click on the Pages document icon and select "Show Package Contents". Inside the bundle is an index file, either index.xml.gz or index.xml. This index file must be uncompressed before being used.

Transforming the Pages index document into DocBook involves "chaining" the XML document through a "pipeline" of XSL stylesheets. There are four XSL stylesheets involved: pages2normalise.xsl, normalise2sections.xsl, sections2blocks.xsl and blocks2dbk.xsl. NB. the last three stylesheets are the same ones used for transforming WordML into DocBook.

Example usage:

xsltproc -o my-word.norm pages2normalise.xsl index.xml
xsltproc -o my-word.sects normalise2sections.xsl my-word.norm
xsltproc -o my-word.blks sections2blocks.xsl my-word.sects
xsltproc -o my-docbook.xml blocks2dbk.xsl my-word.blks

Supported Elements

The roundtripping system does not support all of the DocBook elements. See Supported Elements for the current status of support of DocBook elements.

Getting Help

Contact Explain for support. Explain offers commercial support, for those organisations that need it.

Packaged Press

An easy way to handle the chaining of XSL stylesheets is to use Packaged Press Desktop Edition (PPDE). PPDE is an XProc pipeline controller and a pipeline is provided for the DocBook roundtripping system.


Copyright © 2005-2010 Explain. All rights reserved.Legal notices. Comments or questions about this website? Contact the webperson.