Why internationalization and localization matter

Habr, great time of day for everyone! “Python Web-Developer” course starts soon at OTUS: we invite you to a free Demo-lesson Page Controller and Front Controller Patterns: Implementation in Django and publish the translation of the article by Nicolle Cysneiros – Full Stack Developer (Labcodes).

According to the always truthful information on Wikipedia, there are about 360 million native English speakers in the world. As developers, we are so used to writing code and documentation in English that we don’t realize that this number is everything. 4.67% of the world’s population. A common language of communication between developers is, of course, good, but this does not mean that the user should feel discomfort when using your product.

In this article, we will start talking about the concepts of internationalization and localization, and outline their importance to your application. Then we’ll look at some of the elements of internationalization available to developers for working on Python and Django projects. Finally, let’s talk about how we changed our development process to add internationalization.

Localization or internationalization

Localization is the process of adapting an application, product or document for the convenience of users of different countries and cultures.

Whereas internationalization is the process of making it possible to localize a product. That is, the implementation of software in such a way that it knows when and how to display different content depending on the cultural or linguistic affiliation (locale) of the user.

As the saying goes Django documentation: localization is done by translators and internationalization is done by developers.

However, a simplified definition of internationalization and localization can give the false impression that it is only about translation. This process also includes several types of customization that make the user more comfortable using your product, for example:

  • Date and currency format;

  • Currency conversion;

  • Unit conversion;

  • Unicode characters and bidirectional text (see below);

  • Time zones, calendar and special holidays.

Wikipedia home page in English
Wikipedia home page in English
Wikipedia home page in Arabic
Wikipedia home page in Arabic

Through such adaptations, we can improve the user experience of our application.

How is this done in Python?

GNU gettext

There are several tools that can help you localize your Python application. Let’s start with the GNU package gettextwhich is part of Translation Project… This package contains:

  • a library that supports the extraction of translated messages at runtime;

  • a set of conventions on how to write code to maintain message catalogs;

  • a library that supports parsing and creating files containing translated messages.

The following code snippet is just Hello World in a file app.pywhere the module is used gettext in Python to create a translation object (gettext.translation) in the application domain, indicating the locale directory and language into which we want to translate the strings. Then we assign the function gettext underscore character (common practice to reduce typing overhead gettext for each translated string), and finally, set the flag to the line «Hello World!»to be translated.

import gettext
gettext.bindtextdomain("app", "/locale")
gettext.textdomain("app")
t = gettext.translation("app", localedir="locale", languages=['en_US'])
t.install()
_ = t.gettext

greeting = _("Hello, world!")
print(greeting)

After tagging translatable strings in our code, we can assemble them using the command line tool GNU xgettext… This tool will generate a PO file that will contain all the lines we marked.

xgettext -d app app.py

A PO file (or Portable Object file) contains a list of records, and the record structure looks like this:

#  translator-comments
#. extracted-comments
#: reference…
#, flag…
#| msgid previous-untranslated-string
msgid untranslated-string
msgstr translated-string

We can add comments for translators, links and flags for the line. After that, we refer to the post ID (msgid), which is an untranslated string marked in the code and a record string (msgstr) Is the translated version of this string.

When we run xgettext on the command line by passing app.py as an input file, the resulting PO file is:

"Project-Id-Version: PACKAGE VERSIONn"
"Report-Msgid-Bugs-To: n"
"POT-Creation-Date: 2019-05-03 13:23-0300n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONEn"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>n"
"Language-Team: LANGUAGE <LL@li.org>n"
"Language: n"
"MIME-Version: 1.0n"
"Content-Type: text/plain; charset=UTF-8n"
"Content-Transfer-Encoding: 8bitn"

#: app.py:7
msgid "Hello, world!"
msgstr ""

At the beginning of the file, we have metadata about the file, project, and translation process. Then there is an untranslated line “Hello World!“As the record ID and an empty string for the record line. If no translation is specified for a record, then the record ID will be used for translation.

After generating the PO file, you can start translating terms into different languages. It is important to note that the GNU library gettext will search for translated PO files in the path to a folder of a certain type (<localedir>/<languagecode>/LCMESSAGES/<domain>.po), that is, there must be one PO file for each language you want to support.

|-- app.py
|-- locale
   |-- en_US
   |   |-- LC_MESSAGES
   |       |-- app.po
   |-- pt_BR
       |-- LC_MESSAGES
       |   |-- app.po

Here is an example PO file with Portuguese translation:

"Project-Id-Version: PACKAGE VERSIONn"
"Report-Msgid-Bugs-To: n"
"POT-Creation-Date: 2019-05-03 13:23-0300n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONEn"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>n"
"Language-Team: LANGUAGE <LL@li.org>n"
"Language: n"
"MIME-Version: 1.0n"
"Content-Type: text/plain; charset=UTF-8n"
"Content-Transfer-Encoding: 8bitn"

#: app.py:7
msgid "Hello, world!"
msgstr "Olá, mundo!"

To use translated strings in your code, you need to compile the PO file to an MO file using the command msgfmt

msgfmt -o app.mo app.po

When the MO file is ready, you can change the program language to Portuguese by feeding it to the translation function. If we run the following code, the marked line will be translated as “Olá, mundo! “:

import gettext

gettext.bindtextdomain("app", "/locale")
gettext.textdomain("app")
t = gettext.translation("app", localedir="locale", languages=['pt_BR'])
t.install()
_ = t.gettext

greeting = _("Hello, world!")
print(greeting)

Locale module

This module has access to the POSIX locale database and is especially useful for handling date, number, and currency formats. The example below shows how to use the library locale:

import datetime
import locale

locale.setlocale(locale.LC_ALL, locale="en_US")
local_conv = locale.localeconv()
now = datetime.datetime.now()
some_price = 1234567.89
formatted_price = locale.format('%1.2f', some_price, grouping=True)
currency_symbol = local_conv['currency_symbol']

print(now.strftime('%x'))
print(f'{currency_symbol}{formatted_price}')

In this example, we import the module, change all locale settings to US English, and extract the locale conventions. Using the method locale.format we can format the number and not worry about the tens and thousands separators. Using the directive %x for date formatting, the day, month, and year will be in the correct order for the current locale. From the conventions of the locale, we also get the correct symbol for the currency.

Below is the output of that Python code. We see that the date matches the format Month/Day/Year, the decimal separator is a period and the thousands separator is a comma, and there is also a dollar sign for the US currency.

$ python format_example.py
05/03/2019
$1,234,567.89

Now with the same code, but changing the locale to Portuguese Brazil, we get a different output based on Brazilian formatting conventions: the date will be displayed in the format Month/Day/Year, the comma will be the separator for tens, and the period for thousands, the character R $ will indicate that the amount is in Brazilian reais.

import datetime
import locale

locale.setlocale(locale.LC_ALL, locale="pt_BR")
local_conv = locale.localeconv()
now = datetime.datetime.now()
some_price = 1234567.89
formatted_price = locale.format('%1.2f', some_price, grouping=True)
currency_symbol = local_conv['currency_symbol']

print(now.strftime('%x'))
print(f'{currency_symbol}{formatted_price}')

Is it easier in Django?

Translations and formatting

Internationalization is enabled by default when you create a Django project. Translation module encapsulates the GNU library and provides functionality gettext with translation settings based on the language derived from the title Accept-Languagethat the browser passes in the request object. So, all that Python code that we saw earlier turns out to be encapsulated in the translation module from django utilsso we can jump far ahead and just use the function gettext:

from django.http import HttpResponse
from django.utils.translation import gettext as _

def my_view(request):
    greetings = _('Hello, World!')
    return HttpResponse(greetings)

For translations, we can mark translatable strings in Python code and in the template (after loading the internationalization tags). Tag trans template translates one line, whereas the tag blocktrans can mark a whole block of strings as translatable, including variable content.

<p>{% trans "Hello, World!" %}</p>
<p>{% blocktrans %}This string will have {{ value }} inside.{% endblocktrans %}</p>

Besides the standard function gettext Django has lazy translations: the tagged string will only be translated when the value is used in the context of the string, such as when rendering a template. This is especially useful for translating attributes. help_text and verbose_name in Django models.

Similar to the GNU command line interface, django admin provides commands equivalent to those commonly used during development. To collect all the strings marked as translatable in your code, you just need to run the commands django admin makemessages for each locale you want to support on your system. Once you create the folder locale in the project workspace, this command will automatically create the correct folder structure for the PO file for each language.

To compile all PO files you just need to do django admin compilemessages… If you need to copy the PO file for a specific locale, you can pass it as an argument django-admin compilemessages --locale=pt_BR… For a more complete understanding of how translations work in Django, you can check out documentation

Django also uses a header Accept-Language to determine the user’s locale and format dates, times and numbers correctly. In the example below, we see a simple form with DateField and DecimalField… To indicate that we want to receive this input in a format consistent with the user’s locale, we just need to pass the parameter localize with the meaning True into a form field instance.

from django import forms

class DatePriceForm(forms.Form):
    date = forms.DateField(localize=True)
    price = forms.DecimalField(max_digits=10, decimal_places=2, localize=True)

How is the development process changing?

After the application is internationalized, the deployment process must be adapted to the translation process. In our project, we submit any new terms for translation as soon as we deploy to the staging environment. Deployment to production will be approved immediately after translation of all terms and compilation of PO files.

Another important change in the development flow in our case was the addition of integration tests for different locales at the QA stage. The QA team will simulate the locales supported by our application and check if all texts have been translated correctly, currencies and units have been converted.

Our main takeaway from this entire internationalization process is that all these steps should have been laid down in the design phase at the very beginning of the project. Halting all development to internationalize is not the best solution. If your project is not already at an early stage, then I recommend following boy scout rule: Start flagging lines that need to be translated when you implement a new feature or fix a non-critical bug. Thus, we will still be delivering new features along with a gradual process of internationalization.


Is it interesting to develop in this direction? Learn more about the course “Python Web-Developer” and sign up for free Demo-lessons at OTUS!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *