Compiling DOCX from ADOC
An article about how you can build a docx file from a git(adoc) tree.
It is worth noting that it was tested on Windows, but it is not a fact that in 100% of cases everything will work as it should.
Required utilities
Your preliminary actions:
Use any git terminal that allows you to do git pull from your repository. Don't forget to do it first git fetch.
Get to know the concept of the framework docToolchain and install it. The framework allows you to convert asciidoc to various other formats. Installation is described Online.
Hidden text
There is no need to use Arc42 templates, better initialize your own empty directory.
Install two additional plugins – reveal.js And asciidoctor-reveal.js.
Install Pandoc – another universal text converter. DocToolchain uses Pandoc in its conversion chain. Pandoc installation is described in detail Online.
Now install Gradle – this is our automatic build. You can read how to install Gradle Online.
Do not install Gradle version 8.0 or higher! Something is wrong with him.
You'll also need Groovy. You can read about installing Groovy on the website documentation from the developers.
Download a streaming text editor sed.We will need it to search and replace text blocks in our docx.
And the last utility is UnZip archiver.
Structure of docToolchain
After deploying the framework and initializing the project directory, the project folder structure will look something like this (unnecessary directories and files have been removed here!):
What is important to us in this directory structure:
The top-level name of the directory where our docToolchain framework is installed.
This directory stores the startup files for docToolchain itself. If the framework is registered in the system environment (systemPATH%), then the launch will be available from any directory.
The main directory of our project. We initialize this directory with the command:
./gradlew -b init.gradle initExisting -PnewDocDir=<your directory>
Catalog with conversion scripts.
An important file for us at the current stage. In this file we write a transformation script from adoc to docbook, and then to docx.
Hidden text
For reasons unknown to me, the configuration of a separate file created exclusively for the project does not work. So let's configure the global file.
Global file with Groovy settings.
Project structure
Our project directory structure:
Let's look at the structure:
Catalog of our project. This directory contains the master files that will be converted.
A directory that contains build folders for different formats. This is where the assembled files are moved, i.e., if we are assembling docx, then running the assembly script will create a folder docx and will place the created one here ${filename}.docx.
The required file is in docx format. The file name is taken from the name of the file we are converting.
Storage of reference docx files. In this directory we place those docx files that we will use as docx templates for transformation.
docx file used as a template.
In the concept of the docToolchain framework, this directory should be used for the content specified in the master files, for example, in the form of asciidoctor-files. In fact, we can move away from this concept and focus on the directory structure that we set for our git tree. This will be discussed in more detail below.
A directory taken directly from the git tree that contains adoc files with content.
Groovy configuration file.
Master file containing inclusions, which will be transformed into docx.
Preparing files for transformation
Let's look at the structure of our git tree, which we cloned locally:
It is worth noting that this structure is more typical of those documentation projects that fully comply with the docs-as-code paradigm and use asciidoc as a markup language.
Let's take a quick look at what I think are the main tree directories.
A project branch containing documentation types.
The type of manual we will transform into docx.
Starting from this directory, the contents of the directory will need to be copied to our project directory.
Catalog with images. The directory is required for correct transformation.
A directory containing the content of our guide. All content is declared as inclusions in the master file.
The master file itself. Contains inclusions, visual transformation logic, etc.
Objects 4, 5 And 6 we copy to the root of our project.
Setting up pandoc.gradle
In this script we override some default attributes.
I recommend making a backup copy of this file.
if(referenceDocFile?.trim()) {
args = ["-r","docbook",
"-s", <!--1-->
"-t","docx",
"-o","../docx/$targetFile",
"--reference-doc=../docx/gost.docx" <!--2-->
,sourceFile,
"--toc" <!--3-->
]
Where:
Key -sthrough which we force the type for ${input}.docx.
After the key –reference-doc= we specify the path to our docx template.
Key –toc forces the generation of a table of contents for docx.
Setting up config.groovy
This file is generated automatically in the project folder after the initialization procedure is completed.
inputPath="."; <!--1-->
referenceDocFile = "gost.docx" <!--2-->
inputFiles = [
[file: 'API.adoc' <!--3-->
,formats: ['html','pdf','docbook','docx' <!--4-->
]],
]
Where:
Installs ${inputPath} to convert from docbook to docx. It must be set exactly as in the example.
Our docx template.
The master file that we will convert.
Types of files into which we will convert our asciidoc file.
We can exclude html and pdf formats, but docbook and docx must be specified explicitly.
Running the script
To run the script, run the command from the terminal:
doctoolchain docs convertToDocx
As a result of executing the command, Gradle will execute taskspecified by us in the file pandoc.gradle.
Replacing Table of Contents
After conversion from one format to another, our resulting docx file contains the title for the table of contents in English.
To change the language automatically:
Place the resulting docx file in any temporary directory.
Copy the file with the script to the same directory. If necessary, make changes to the contents of this script.
#!/bin/sh
unzip API.docx <!--1-->
-d tmp #unzip
sed -i '' -e "s/Table of Contents/Оглавление/g" <!--2-->
tmp/word/document.xml <!--3-->
#find/replace
cd tmp && zip -r ../API1.docx <!--4-->
* && cd .. #zip
rm -rf tmp
Where:
The script unpacks the specified docx file.
Here we indicate what needs to be replaced ${Table of Contents}what will we change to (${Table of Contents} as an example).
Where will we change? As you can see, sed changes the data in the already unpacked docx.
Repacking occurs indicating the name of the new file. I recommend always creating a new file.
Open the copied docx file. Your Word editor will display a warning window with permission to generate a TOC. Do this and save the file.
This step is very important!
Execute the script from the terminal.
bash find.sh
Hidden text
To run a script via bash on Windows you can use various methods such as BusyBox 64 or MSYS2.
Thus, we received a docx file from a predefined template using modern documentation storage systems.
Also, we touched a little on the structure of the docx file and were able to essentially automate routine actions using simple scripts.