Automate the creation of reports in Word with data from Excel in Python

Ridiculously simple code that could save you a lot of time in the future. And as a fan of automation who tries to avoid any monotonous and routine work, I am simply delighted with such solutions. By the way, I’m talking about projects in this topic here.

For the first time I faced such a problem many years ago, when my landlord, from whom I rented an apartment in St. Petersburg, came to me with a request from the “what a programmer” series (although at that time I was still working as an engineer and I didn’t have any super knowledge in development, so I can assure you that almost any novice developer can handle this code). She organized children’s dance competitions at the regional (or higher) level and for each competition hired a separate person who printed diplomas within two days. There were hundreds of children, diplomas too, and purely due to the human factor, naturally, there were quite a lot of marriages. And she asked me to automate it somehow (you’re a programmer).

Technically, the task looked like this: based on the results of each stage of the competition, an Excel table was generated, each line of which contained information about the child/dance group, director, nomination, program and place (1st, 2nd or 3rd degree diploma). Accordingly, according to the data from each line, it was necessary to generate a separate diploma according to the template. The template was a regular Word document, where you need to insert data from the table in certain places.

The task is typical for many, and when I created my course on automating routine tasks in python, I remembered this story and devoted one practical work to this, which you, in fact, will see below. Of course, I won’t share the data of talented children, but for my listeners I spent the whole evening coming up with a database with imaginary employees who need to generate vacation applications. The task is completely similar: an Excel table with information about employees and vacation dates, an application template in Word. Demo below.

Vacation application template in Word

Vacation application template in Word

Table with employee data xlsx

Table with employee data xlsx

All data was made up from my head, the matches with real persons are completely random. I also came up with email addresses; many were entered incorrectly on purpose for practice on regular expressions in my course.

Preparing the template

The library is used to substitute data into the .docx template python-docx-template.

Library docxtpl used to work with Microsoft Word (.docx) documents in Python, which allows you to create and edit documents based on templates. It allows you to automate the document creation process by inserting variable data into pre-created templates.

How to work with the library docxtpl:

  1. Document templates: You create a document template in .docx format using Microsoft Word or another editor. In the template you can define places where variable data will be inserted.

  2. Using Variables: In a template, you define variables that will be populated with data from your application or data source. For example, in the template you can specify “{{ username }}” and then insert the real username in that space.

  3. Document generation: Once the template is created and populated with variables, you can use the library docxtpl to generate a document based on this template. It will automatically replace the variables in the template with the appropriate data.

Now let’s look at all this with an example. First you need to prepare the template, that is indicate the places in the document where we will insert the data.

How to do this: in those places in our template where some data will be substituted, we insert double curly braces and the name of the variable that we will substitute next. Example below.

We set formatting not in the code, but in the template itself. That is, everything related to the font, size, indents, etc. – we indicate all this manually in the document template. Naturally, this can be done in code, but in this article we will not do this, we will only substitute the data.

Ready template

Ready template

The code for substituting some data into a template is ridiculously simple.

Library installation:

pip install docxtpl

Code:

# Импортируем нужный объект из библиотеки 
from docxtpl import DocxTemplate

# Загрузка шаблона
doc = DocxTemplate("шаблон.docx")

# Данные для заполнения шаблона
context = {
    'company': 'Моя любимая работа',
    'name': 'Иван',
    'last_name': 'Иванов',
    'start_data': '10.03.2024',
    'end_data': '24.03.2024'
}

# Заполнение шаблона данными
doc.render(context)

# Сохранение документа
doc.save("новый_документ.docx")

Let’s take a step-by-step look at what’s happening here:

  1. from docxtpl import DocxTemplate: Import the class DocxTemplate from the library docxtplwhich allows you to work with .docx files and fill them with data from the dictionary.

  2. doc = DocxTemplate("шаблон.docx"): Load the document template from the file “template.docx” and create an object docwhich this document represents.

  3. context = { ... }: Create a dictionary context, containing data to fill out the template. The dictionary keys correspond to the variables in the template, which will be replaced with the corresponding values. Pay attention to this.

  4. doc.render(context): Filling the template with data from the dictionary context. This will replace the variables in the template with the appropriate values.

  5. doc.save("новый_документ.docx"): Save the completed document into a new file named “new_document.docx”.

Thus, this code takes a document template, fills it with data from the dictionary context and saves the result in a new .docx file.

Run the code and see the result

Connecting the table

OpenPyXL is a library for working with Microsoft Excel (.xlsx) files

Installation

 pip install openpyxl

Key Features

  • Reading data: OpenPyXL allows you to read data from existing Excel files.

  • Data recording: The library allows you to create new Excel files and write data to these files.

  • Data modification: You can modify existing data in Excel tables, add new rows, change formatting, etc.

  • Working with formulas: OpenPyXL supports the use of Excel formulas, allowing you to calculate values ​​based on other cells.

What functions of this library do we need now:

wb = openpyxl.load_workbook(filename="people_data_vacation.xlsx")
  1. load_workbook: This is a function from the library openpyxlwhich loads a workbook from an Excel file.

  2. filename="people_data_vacation.xlsx": This function argument load_workbook specifies the path to the file to be downloaded. In this case, this is the file “people_data_vacation.xlsx”.

After executing this line of code, the variable wb will contain a workbook object that represents the contents of the file “people_data_vacation.xlsx”. We can then use this object to access the contents of an Excel file, such as worksheets and their contents. What will we do next?

sheet = wb['vacation']

This line of code retrieves the worksheet from the workbook wb and assigns it to a variable sheet. In this case, we are retrieving a leaf named ‘vacation’. If such a sheet does not exist, an exception will be thrown.

Next, we can extract data from this sheet (from specific cells).

Let’s look closely at one row from the table. Let’s come up with an algorithm for one employee and then it won’t be difficult to apply it to the rest in a cycle.

For example, we need to fill out an application for the first employee – the second line of the table. Accordingly, you need to get data from cells A2 (last name), B2 (first name), D2 (company), F2 (start of vacation) and G2 (end of vacation).

You can get data from a specific cell with the following command

sheet['A2'].value

After executing this line of code, the value that is contained in cell A2 of the worksheet will be returned sheet.

Then for the remaining cells it will look like this:

name = sheet['B2'].value
last_name = sheet['A2'].value
company = sheet['D2'].value
start_data = sheet['F2'].value.date()
end_data = sheet['G2'].value.date()

I also add the .date() function to the dates so that it returns not in the format ‘2022-05-01 00:00:00’, but at least 2022-05-01.

We supplement the code written in the last part to generate a statement with data from the table for the first employee

import openpyxl
from docxtpl import DocxTemplate

doc = DocxTemplate("шаблон.docx")

wb = openpyxl.load_workbook(filename="people_data_vacation.xlsx")
sheet = wb['vacation']

name = sheet['B2'].value
last_name = sheet['A2'].value
company = sheet['D2'].value
start_data = sheet['F2'].value.date()
end_data = sheet['G2'].value.date()

# Данные для заполнения шаблона
context = {
    'company': company,
    'name': name,
    'last_name': last_name,
    'start_data': start_data,
    'end_data': end_data  
    }

# Заполнение шаблона данными
doc.render(context)

# Сохранение документа
doc.save("новый_документ.docx")

Please note that I replaced the data in the context dictionary with variables. It would be possible to do this without intermediate variables and immediately create a dictionary, but in the future you may need to add some checks, which cannot be done in a dictionary.

It turns out that the code for the first employee is ready, all that remains is to add a cycle and generate applications for everyone else. Essentially, the only thing that will change in the code for each employee is the cell index, so let’s bind it to the counter in the loop.

for num in range(2,6):
    
    name = sheet['B'+str(num)].value
    last_name = sheet['A'+str(num)].value
    company = sheet['D'+str(num)].value
    start_data = sheet['F'+str(num)].value.date()
    end_data = sheet['G'+str(num)].value.date()
    
    context = {
    'company': company,
    'name': name,
    'last_name': last_name,
    'start_data': start_data,
    'end_data': end_data  
    }
    
    doc.render(context)
    doc.save(last_name+' заявление на отпуск.docx')

Design [‘B’+str(num)] will return B2, B3, B4, etc. in turn.

Please also note that for each new document a new name must be formed; for convenience, I immediately add the last name doc.save(last_name+’ application for leave.docx’) so that in the end the documents look like this

range(2,6) - we start with index 2, since the first row of the table is the headers, and up to 6 is purely for testing the code, because if you make a mistake, it will be easier to delete a few test statements than a couple hundred. If several test documents were generated correctly, then you can replace the final number with the required number (either manually or use len(list(sheet.rows)+1)

Final code

import openpyxl
from docxtpl import DocxTemplate

wb = openpyxl.load_workbook(filename="people_data_vacation.xlsx")
sheet = wb['vacation']

doc = DocxTemplate('шаблон.docx')

for num in range(2,len(list(sheet.rows))+1):
    
    name = sheet['B'+str(num)].value
    last_name = sheet['A'+str(num)].value
    company = sheet['D'+str(num)].value
    start_data = sheet['F'+str(num)].value.date()
    end_data = sheet['G'+str(num)].value.date()
    
    context = {
    'company': company,
    'name': name,
    'last_name': last_name,
    'start_data': start_data,
    'end_data': end_data  
    }
    
    doc.render(context)
    doc.save(last_name+' заявление на отпуск.docx')

Let’s run the code and enjoy the result.

In your channel IT as digital creativity I write about automation, various interesting training, creating courses, talk about my experience and how I went from an engineer at a factory to creating an IT startup, give recommendations and announce the launch of interesting development projects.

I hope that the material was useful to you! And good luck in your endeavors!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *