Compiling DOCX from ADOC

An article about how you can build a docx file from a git(adoc) tree.

It is worth noting that it was tested on Windows, but it is not a fact that in 100% of cases everything will work as it should.


Required utilities

Your preliminary actions:

  • Use any git terminal that allows you to do git pull from your repository. Don't forget to do it first git fetch.

  • Get to know the concept of the framework docToolchain and install it. The framework allows you to convert asciidoc to various other formats. Installation is described Online.

Hidden text

There is no need to use Arc42 templates, better initialize your own empty directory.

  • Install two additional plugins – reveal.js And asciidoctor-reveal.js.

  • Install Pandoc – another universal text converter. DocToolchain uses Pandoc in its conversion chain. Pandoc installation is described in detail Online.

  • Now install Gradle – this is our automatic build. You can read how to install Gradle Online.

Do not install Gradle version 8.0 or higher! Something is wrong with him.

  • You'll also need Groovy. You can read about installing Groovy on the website documentation from the developers.

  • Download a streaming text editor sed.We will need it to search and replace text blocks in our docx.

  • And the last utility is UnZip archiver.

Structure of docToolchain

After deploying the framework and initializing the project directory, the project folder structure will look something like this (unnecessary directories and files have been removed here!):

Structure for our docToolchain project

Structure for our docToolchain project

What is important to us in this directory structure:

  1. The top-level name of the directory where our docToolchain framework is installed.

  2. This directory stores the startup files for docToolchain itself. If the framework is registered in the system environment (systemPATH%), then the launch will be available from any directory.

  3. The main directory of our project. We initialize this directory with the command:

    ./gradlew -b init.gradle initExisting -PnewDocDir=<your directory>
  4. Catalog with conversion scripts.

  5. An important file for us at the current stage. In this file we write a transformation script from adoc to docbook, and then to docx.

    Hidden text

    For reasons unknown to me, the configuration of a separate file created exclusively for the project does not work. So let's configure the global file.

  6. Global file with Groovy settings.

Project structure

Our project directory structure:

Our directory structure with documentation

Our directory structure with documentation

Let's look at the structure:

  1. Catalog of our project. This directory contains the master files that will be converted.

  2. A directory that contains build folders for different formats. This is where the assembled files are moved, i.e., if we are assembling docx, then running the assembly script will create a folder docx and will place the created one here ${filename}.docx.

  3. The required file is in docx format. The file name is taken from the name of the file we are converting.

  4. Storage of reference docx files. In this directory we place those docx files that we will use as docx templates for transformation.

  5. docx file used as a template.

  6. In the concept of the docToolchain framework, this directory should be used for the content specified in the master files, for example, in the form of asciidoctor-files. In fact, we can move away from this concept and focus on the directory structure that we set for our git tree. This will be discussed in more detail below.

  7. A directory taken directly from the git tree that contains adoc files with content.

  8. Groovy configuration file.

  9. Master file containing inclusions, which will be transformed into docx.

Preparing files for transformation

Let's look at the structure of our git tree, which we cloned locally:

Typical git structure for adoc based projects

Typical git structure for adoc based projects

It is worth noting that this structure is more typical of those documentation projects that fully comply with the docs-as-code paradigm and use asciidoc as a markup language.

Let's take a quick look at what I think are the main tree directories.

  1. A project branch containing documentation types.

  2. The type of manual we will transform into docx.

  3. Starting from this directory, the contents of the directory will need to be copied to our project directory.

  4. Catalog with images. The directory is required for correct transformation.

  5. A directory containing the content of our guide. All content is declared as inclusions in the master file.

  6. The master file itself. Contains inclusions, visual transformation logic, etc.

  7. Objects 4, 5 And 6 we copy to the root of our project.

Setting up pandoc.gradle

In this script we override some default attributes.

I recommend making a backup copy of this file.

        if(referenceDocFile?.trim()) {
            args = ["-r","docbook",
                    "-s", <!--1-->
                    "-t","docx",
                    "-o","../docx/$targetFile",
                    "--reference-doc=../docx/gost.docx" <!--2-->
                    ,sourceFile,
                    "--toc" <!--3-->
                    ]

Where:

  1. Key -sthrough which we force the type for ${input}.docx.

  2. After the key –reference-doc= we specify the path to our docx template.

  3. Key –toc forces the generation of a table of contents for docx.

Setting up config.groovy

This file is generated automatically in the project folder after the initialization procedure is completed.

inputPath="."; <!--1-->

referenceDocFile = "gost.docx" <!--2-->

inputFiles = [
       [file: 'API.adoc' <!--3-->
       ,formats: ['html','pdf','docbook','docx' <!--4-->
       ]],
]

Where:

  1. Installs ${inputPath} to convert from docbook to docx. It must be set exactly as in the example.

  2. Our docx template.

  3. The master file that we will convert.

  4. Types of files into which we will convert our asciidoc file.

We can exclude html and pdf formats, but docbook and docx must be specified explicitly.

Running the script

To run the script, run the command from the terminal:

doctoolchain docs convertToDocx

As a result of executing the command, Gradle will execute taskspecified by us in the file pandoc.gradle.

Replacing Table of Contents

After conversion from one format to another, our resulting docx file contains the title for the table of contents in English.

To change the language automatically:

  1. Place the resulting docx file in any temporary directory.

  2. Copy the file with the script to the same directory. If necessary, make changes to the contents of this script.

#!/bin/sh

    unzip API.docx <!--1-->
    -d tmp #unzip
    sed -i '' -e "s/Table of Contents/Оглавление/g" <!--2-->
    tmp/word/document.xml <!--3-->
    #find/replace
    cd tmp && zip -r ../API1.docx <!--4-->
     * && cd .. #zip
    rm -rf tmp

Where:

  1. The script unpacks the specified docx file.

  2. Here we indicate what needs to be replaced ${Table of Contents}what will we change to (${Table of Contents} as an example).

  3. Where will we change? As you can see, sed changes the data in the already unpacked docx.

  4. Repacking occurs indicating the name of the new file. I recommend always creating a new file.

Open the copied docx file. Your Word editor will display a warning window with permission to generate a TOC. Do this and save the file.

This step is very important!

Execute the script from the terminal.

bash find.sh
Hidden text

Thus, we received a docx file from a predefined template using modern documentation storage systems.
Also, we touched a little on the structure of the docx file and were able to essentially automate routine actions using simple scripts.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *