Classification of combinatorial objects using the example of Latin squares

I would like to share some experience in writing programs for enumerating combinatorial objects from a given class (e.g., Latin squares, Steiner systems, error-correcting codes, orthogonal arrays). Usually, it is necessary to enumerate all objects with given parameters, for example, tables of a given size filled with numbers according to some rule. The word “all” can mean both “all different” and “fundamentally different” in the sense specific to a particular task, for example, tables can be considered fundamentally identical (equivalent) if one is obtained from another by rearranging the rows. The number of objects obtained as a result of such an enumeration can vary from 0 (this also makes sense, since as a result we establish that objects with these parameters do not exist) to tens of billions. Usually, calculations of this kind are part of a theoretical study, but similar approaches can be used in practical problems.

The programming language in the examples under consideration is Sageessentially this is an extension Python, the text of the programs in the examples from this note does not differ from Python, with the exception of the use of some library functions. Examples can be run on an online calculator https://sagecell.sagemath.org/.

Description of the task and approach

Let's start with a description of a particular problem, using the example of which we will consider the general enumeration algorithm. We will try to count all Latin squares of order 7. Latin square order n is a table of size n-by-n, filled with n characters so that in each row and in each column all symbols appear once. We will use symbols from 0 to n−1, although Euler used letters of the Latin alphabet for this purpose (hence the name). Example of a Latin square of order 5:

$\begin{array}{|c|c|c|c|c|}\hline0&1&2&3&4\\\hline1&0&3&4&2\\\hline2&4&1&0&3\\\hline3&2&4&1&0\\\hline4&3&0&2&1\\\hline\end{array}$

The number of all Latin squares of order 7 is a fourteen-digit number, and although in principle it is possible to generate such a number and even store it somewhere, we will simplify the task somewhat. We will list squares “up to equivalence,” that is, if several different squares are obtained from each other by some natural transformations, we will combine them into one group, an equivalence class, and from each equivalence class we will retain one representative. As such natural equivalence transformations, we will take all possible combinations of permutations of the rows of the square, permutations of the columns, and permutations of the characters of the alphabet, which is used to fill the cells. Obviously, applying any of these operations to a Latin square, we will again obtain a Latin square. Operations of the type under consideration are called isotopiesand Latin squares obtained from each other by isotopies are called isotopic; we will call the corresponding equivalence classes isotopic classes. In principle, we can expand the equivalence transformations under consideration by adding, for example, the ability to reflect a square diagonally (changing the roles of rows and columns; similarly, you can change the roles of columns, rows and symbols), then the number of equivalence classes will be even smaller, but we will start with isotopies.

For example, it is easy to check that all Latin squares of order 3 are isotopic. Latin squares of order 4 are already divided into two isotopic classes, with representatives

$\begin{array}{|c|c|c|c|}\hline0&1&2&3\\\hline1&0&3&2\\\hline2&3&0&1\\\hline3&2&1&0\\\hline\end{array}$ And $\begin{array}{|c|c|c|c|}\hline0&1&2&3\\\hline1&2&3&0\\\hline2&3&0&1\\\hline3&0&1&2\\\hline\end{array}$ .

The number of isotopic classes of Latin squares is of the order of seven – 564, and this is quite suitable for our exercises. For order 8, this number is 1676267, which can no longer be done using an online calculator, but perhaps you can still carry out the calculations on your home computer or in the cloud. To count Latin squares of order greater than 8, more advanced algorithms are needed.

The general transfer scheme consists of the following steps:

I. Creation of objects.
II. Checking for equivalence (in our case, isotopicity) and preserving exactly one representative from each equivalence class.
III. Numerical verification of the correctness of calculations.

You can write a program that will immediately generate all Latin squares of small order, perhaps even 6, their number is “only” 812851200. But the 14-digit number of all Latin squares of order 7 …, even just recalculating such a number of objects in a reasonable time is problematic, not not to mention generating and comparing for isotopicity. Therefore, we will divide the task into several steps, at each step listing some intermediate objects, like “unfinished” Latin squares. Latin rectangles are ideal for such intermediate objects. A Latin rectangle is an unfinished Latin square that has several lower rows left unfilled. Formally, a table of size m by n filled with numbers from 0 to n−1 is called Latin rectangle, if no row or column contains duplicate numbers. Hall's theorem states that any Latin rectangle can be expanded to a Latin square, but we won't need this wonderful fact.

So, we will list all the non-isotopic Latin rectangles in order, first 1 by n, then 2 by n, 3 by n, etc., up to the n by n rectangles that are Latin squares. At each step we will repeat stages I, II and III, but at stage I of step m we will build not all Latin rectangles, but those that are continuations of pairwise non-isotopic representatives of rectangles m−1 by n obtained at the previous step of calculations (essentially , we need to assign one line to each of the available m−1 by n rectangles in all possible ways that do not contradict the definition).

For example, suppose for n=4 and m=2 we found two non-isotopic Latin rectangles

$\begin{array}{|c|c|c|c|}\hline0&1&2&3\\\hline1&0&3&2\\\hline\end{array}$ And $\begin{array}{|c|c|c|c|}\hline0&1&2&3\\\hline1&2&3&0\\\hline\end{array}$ .

At the next step m=3, adding at stage I the third line in all possible ways to each of them, we get

$\begin{array}{|c|c|c|c|} \hline0&1&2&3\\ \hline1&0&3&2\\ \hline2&3&0&1\\ \hline\end{array}$ , $\begin{array}{|c|c|c|c|} \hline0&1&2&3\\ \hline1&0&3&2\\ \hline2&3&1&0\\ \hline\end{array}$ , $\begin{array}{|c|c|c|c|} \hline0&1&2&3\\ \hline1&0&3&2\\ \hline3&2&0&1\\ \hline\end{array}$ , $\begin{array}{|c|c|c|c|} \hline0&1&2&3\\ \hline1&0&3&2\\ \hline3&2&1&0\\ \hline\end{array}$

And $\begin{array}{|c|c|c|c|} \hline0&1&2&3\\ \hline1&2&3&0\\ \hline2&3&0&1\\ \hline\end{array}$ , $\begin{array}{|c|c|c|c|} \hline0&1&2&3\\ \hline1&2&3&0\\ \hline3&0&1&2\\ \hline\end{array}$ .

It can be verified that the first and fourth rectangles are isotopic, and the second, third, fifth and sixth are also pairwise isotopic. Thus, after stage II we will have two non-isotopic representatives, which we will continue to 4 by 4 squares.

Next, we will describe all three stages, I, II, and III, each in its own section, but first we will define graphs and the basic concepts associated with them that we will need. The point is that checking combinatorial objects for equivalence most often comes down to checking the corresponding specially constructed graphs for isomorphism (this is also a type of equivalence), which is carried out using the appropriate libraries.

Graph is a pair of sets, a set of vertices and a set of edges that can be interpreted as pairwise connections between vertices. An arbitrary set can be considered as a set of vertices, and each edge is a set of two vertices. Two vertices belonging to the same edge are called adjacentor neighboring. Several vertices of a graph, among which any two are adjacent, are called clique. Isomorphism between two graphs with the same number of vertices is a one-to-one mapping of the set of vertices of the first graph to the set of vertices of the second graph, in which adjacent vertices are mapped to adjacent vertices, and non-adjacent to non-adjacent. Automorphism of a graph is an isomorphism of the graph onto itself. Two Counts are isomorphicif there is an isomorphism between them.

Example: consider two graphs on the same vertex set {0,1,2,3}. The first graph has many edges {{0,1},{0,2},{1,2},{1,3}}, the second has {{0,2},{1,2},{1, 3},{2,3}}; the mapping 0→1, 1→2, 2→3, 3→0 is an isomorphism of the first graph onto the second, and the mapping 0→2, 1→1, 2→0, 3→3 is an automorphism of the first graph (but not the second).

Step I: Generating m by n Latin rectangles that extend a given m−1 by n Latin rectangle

The example with Latin rectangles is excellent because the generation of objects can be reduced to both the problem of finding cliques in a graph and the problem of exact coverage. Both of these problems are solved using appropriate libraries. We will look at both methods.

Let's start with a reduction to the problem of finding cliques in a graph. Consider, for example, a 2 by 5 Latin rectangle

$\begin{array}{|c|c|c|c|c|}\hline 0&1&2&3&4\\\hline 1&0&3&4&2\\\hline \phantom{1}&&&&\\\hline \end{array}$ .

We want to extend it in every possible way to a 3 by 5 Latin rectangle. Consider column number 0. It already contains the numbers 0 and 1, so the third row can only contain 2, 3, or 4. We write these possibilities as (0 ,2), (0,3) and (0,4), where 0 in the first place is the column number, and in the second place is a symbol that can be entered in this column. For column number 1 we have (1,2), (1,3), (1,4), then (2,0), (2,1), (2,4), then (3,0), ( 3,1), (3,2) and finally (4,0), (4,1), (4,3). In total we have 15 pairs (0.2), (0.3), (0.4), (1.2), (1.3), (1.4), (2.0), (2.1 ), (2,4), (3,0), (3,1), (3,2), (4,0), (4,1), (4,3). To add another line to this Latin rectangle, you need to select from these 15 pairs five pairs in which all the first components are different (this means that we will enter one new character in 5 different columns) and all second components (this means that in the new line all values are different). To do this, on these 15 pairs we will build a “compatibility graph” in which two pairs will form an edge if both their first and second components are different. Clicks of size 5 in this column will exactly correspond to the opportunities to fill in the third line. Let us now write a generator that will do the same in the general case.

def addRow(LR,algorithm=0):
    """Генератор всех латинских прямоугольников m-на-n,
       продолжающих данный латинский прямоугольник (m-1)-на-n"""
    n = len(LR[0]) # ширина прямоугольника
    N = set(range(n))
    # для каждого столбца, записываем не использованные символы:
    available = ( N-set(col) for col in zip(*LR) )
   # список доступных пар (столбец,символ):
    candidates = [ (i,j) for i,avail in enumerate(available) \
                         for j in avail ]
    if algorithm==0: # при помощи поиска клик 
        # построение графа совместимости
        compatible_pairs = [ [(i,j),(ii,jj)] \
                             for i,j in candidates \
                             for ii,jj in candidates \
                             if i<ii and j!=jj ]
        G = Graph(compatible_pairs) # Sage граф
        if G.clique_number()==n: # если есть клики размера n
            for C in G.cliques_maximum():
                # каждую такую клику преобразуем в строку
                yield LR + [[j for i,j in sorted(C)]]

This code uses Sage's built-in graph type and its two methods clique_number (returns the maximum clique size in the graph) and cliques_maximum (returns all clicks of the maximum size).

Now the second way. Similar to the previous one, for the example under consideration we construct the available 15 pairs (column, symbol). Note that each pair “uses” one column and one character. We need to choose 5 pairs such that each column and each symbol is used exactly once. This is the classic exact cover problem. There is a set (in our example, a set of cardinality 10, five columns and five symbols) and a set of its subsets (in our case, subsets of cardinality 2, consisting of one column and one symbol); you need to choose a subset that covers the given set once, that is, so that each element of the given set occurs in exactly one subset of the subset. This problem is solved by Donald Knuth's Dancing Links algorithm, implemented for different platforms. It is also a special case of an integer linear programming problem, ILP, for which there are many solvers (however, many find only one solution, and we need them all). Let's add an alternative algorithm for the addRow generator using the exact coverage problem solver AllExactCoversbuilt into Sage.

    if algorithm!=0: # при помощи точного покрытия
        # строим exact-cover матрицу M
        M = []
        for i,j in candidates:
            M.append([0]*2*n)
            # первые n элементов нумеруют столбцы, последние n - символы
            M[-1][i] = M[-1][n+j] = 1
        for cover in AllExactCovers(matrix(M)):
            yield LR + [[list(line).index(1,n)-n \
                         for line in reversed(sorted(cover))]]

Step II: isotopic testing

As mentioned above, checking for the equivalence of combinatorial objects often comes down to checking for the isomorphism of the corresponding graphs (in the general case, how to determine the “corresponding graphs” for a given class of objects can be a non-trivial task, sometimes there are several solutions, and they differ greatly in the speed of comparison of graphs for isomorphism). For m by n Latin rectangles, we will build a graph as follows. Vershin mn+n+n+m, take mn as vertices Latin rectangle cells, m rows, n columns, and n characters (encoded in some way, for example with numbers from 0 to mn+n+n+m−1); Each vertex-cell is connected by one edge to the vertex-row in which this cell is located, by one edge to the vertex-column, and by one edge to the vertex-symbol, which is written in this cell.

Theorem 1: Two Latin rectangles are isotopic if and only if there is an isomorphism between the corresponding graphs that preserves the partition of the vertices into cell vertices, row vertices, column vertices and character vertices.

We will not prove the theorem; the fact is quite simple. It reduces the isotopy of Latin rectangles to the isomorphism of the corresponding graphs, but taking into account the partitioning of vertices by roles (isomorphism between graphs should associate cell vertices with cell vertices, row vertices with row vertices, etc.). Fortunately, programs for recognizing graph isomorphism can work with partitions. In practice, checking the isomorphism of graphs usually occurs as follows: for each graph, a canonical graph isomorphic to it is constructed. For isomorphic graphs, the canonical graphs are the same, so they can be compared simply for equality. The benefits of this approach are as follows. Suppose we have 449 graphs and want to check that they are all non-isomorphic. If we simply check each pair for isomorphy, we will need 449 · 448/2 ∼ 100500 such operations, each of which is quite labor-intensive. Instead, we can construct a canonical graph for each of these graphs only 449 times, and carry out 100,500 very easy comparisons for equality. So, let's write the subroutine nonIsotopicwhich selects non-isotopic representatives from a given set of Latin rectangles and returns a list of such representatives. As a separate function rectangleToGraph we will highlight the construction by the graph rectangle using the function built into Sage Graph.

def rectangleToGraph(L,m,n):
    """
    Возвращает характеристический граф 
    латинского m-на-n прямоугольника L
    и разбиение множества вершин по ролям
    """
    V = [ (i,j) for i in range(m) for j in range(n) ] # вершины-ячейки
    Vr = [ ("r",i) for i in range(m) ] # вершины-строки
    Vc = [ ("c",j) for j in range(n) ] # вершины-столбцы
    Vs = [ ("s",k) for k in range(n) ] # вершины-символы
    E  = [ [(i,j),Vr[i]] for i,j in V ] # ребра ячейка-строка
    E += [ [(i,j),Vc[j]] for i,j in V ] # ребра ячейка-столбец
    E += [ [(i,j),Vs[L[i][j]]] for i,j in V ] # ...ячейка-символ
    P = [V,Vr,Vc,Vs] # разбиение множества вершин
    # E += [ [U[i],U[j]] for U in [Vc,Vs] \
    #          for j in range(n) for i in range(j) ] # понадобится позже
    # P = [V,Vr,Vc+Vs] # понадобится позже
    return Graph(E), P # graph and vertex partition

def nonIsotopic(LRs):
    """
    На входе итератор, который выдает латинские прямоугольники LR.
    На выходе -- неизотопные представители 
    """
    GG = {} # словарь порядковых номеров найденных канонических графов
    Repres = [] # неизотопные представители латинских прямоугольников
    for L in LRs:
        m, n = len(L), len(L[0])
        G, P = rectangleToGraph(L,m,n) # граф и разбиение
        Gc = G.canonical_label(partition=P).copy(immutable=True) # канонический
        if Gc not in GG:
            GG.update({Gc:len(Repr)})
            Repres += [L]
    return Repres

Step III: checking the correctness of calculations

This additional step seems optional, but it greatly helps in checking the correctness of the program writing, and the theoretical understanding of what we are doing, and also protects against random calculation errors of the “neutrino flew by and aha” type, which occur in a huge number of calculations. If it is possible to check the search using some formulas, this should always be done. If not, they try to count the same thing in different ways. It’s really sad when you can’t do this either.

With the described enumerative approach, control of the correctness of calculations can be carried out by calculating the power in two ways each found equivalence class K (in our case, isotopic class) of m by n Latin rectangles. On the one hand, we know how many times we encountered Latin rectangles from this class when trying to extend each Latin rectangle m−1 by n. This gives us the formula $M(K) = \sum_{L} M_L \cdot S_L(K)$ where the summation occurs over representatives of L isotopic classes of Latin rectangles m−1 by n, found at the previous stage, M_L – power of the corresponding classes, and – the number of times a rectangle from class K was found among the extensions of the rectangle L.

On the other hand, power M(K) the isotopic class of Latin rectangles m by n can be calculated through the number of autotopies Aut(T) any of its representatives T, $T\in K$ . Namely, $M(K) = m! \cdot n! \cdot n! /Aut(T)$ . Indeed, the number of possible isotopies $m! \cdot n! \cdot n!$ (number of row permutations number of column permutations number of symbol permutations ), of them Aut(T) isotopies give the same rectangle.

Checking that the value M(K) , calculated in two ways, coincides with a very good degree of reliability, guaranteeing that the calculations are correct. But you should not lose your vigilance, since it may turn out that the error lies elsewhere, and you correctly think not what you wanted.

Let's add the necessary check right at the end of the function nonIsotopichaving previously calculated the necessary values (new lines – with variables Auts, Count, parentCount).

def nonIsotopic(LRs):
    """
    На входе итератор, который выдает латинские прямоугольники LR
    вместе с величиной parentCount -- мощностью изотопического
    класса прямоугольника меньшей высоты, продолжением которого
    получен LR.
    На выходе -- неизотопные представители 
    и мощности соответствующих изотопических классов
    """
    GG = {} # словарь порядковых номеров найденных канонических графов
    Repr = [] # неизотопные представители латинских прямоугольников
    Auts = [] # число автотопий для каждого представителя
    Count = [] # мощность каждого изотопического класса
    for LR,parentCount in LRs:
        m, n = len(LR), len(LR[0])
        G, P = rectangleToGraph(LR,m,n)
        Gc = G.canonical_label(partition=P).copy(immutable=True) # канонический
        if Gc not in GG:
            GG.update({Gc:len(Repr)})
            Repr += [LR]
            Auts += [G.automorphism_group(partition=P).order()]
            Count += [0]
        Count[GG[Gc]]+=parentCount # еще раз встретился
    for cnt,aut in zip(Count,Auts):
        cnt0 = factorial(n)**2 * factorial(m) / aut
        # cnt0 = factorial(n)**2 * factorial(m) * 2 / aut # понадобится позже
        assert cnt==cnt0, "error m="+str(m)+": "+str(cnt)+"!="+str(cnt0)
    return Repr, Count

Final Steps

All that remains is to collect everything. We substitute the representative of the Latin rectangles 1 by n manually, then add a line one by one using addRow() and find non-isotopic representatives using nonIsotopic()where we also transfer the powers of the parent isotopic classes to check the correctness of the calculations.

def findLRs(n,alg=0):
    # представители латинских m*n прямоугольников, m=0,1,2,...:
    LRs = [ [ [] ], [ [list(range(n))] ]  ]
    # мощность соответствующих изотопических классов:
    CNTs = [ [1], [factorial(n)] ]
    for m in range(1,n):
        LRs_nxt,CNTs_nxt = nonIsotopic( (LRnxt,cnt) \
                                        for LR,cnt in zip(LRs[-1],CNTs[-1]) \
                                        for LRnxt in addRow(LR,alg) )
        LRs += [LRs_nxt]
        CNTs += [CNTs_nxt]
        print("n:",n,"m:",m+1," classes:",len(LRs_nxt)," total:",sum(CNTs_nxt))
    return LRs

LRs = findLRs(7)

Copy everything (function definitions addRow, rectangleToGraph, nonIsotopic, findLRsand the line LRs=findLRs(7)) into the form of an online calculator https://sagecell.sagemath.org/launch by clicking the “Evaluate” button, and…

m: 2 classes: 4 total: 9344160
m: 3 classes: 56 total: 5411750400
m: 4 classes: 1398 total: 782137036800
m: 5 classes: 6941 total: 20449013760000

… Oh no! During the time allocated on the resource, the program does not have time to complete execution, counting only up to rectangles of height 5, sometimes even 6. Literally a few seconds are missing. But let's not get upset. First, by substituting the found values into the online encyclopedia of integer sequences
https://oeis.org/search?q=56%2C1398%2C6941&go=Search
https://oeis.org/search?q=782137036800+20449013760000&go=Search
we verify that our program produces correct values for both the number of isotopic classes and the number of all Latin rectangles. Secondly, the program terminates successfully for smaller orders, n=6, n=5, … ( for n in (3,4,5,6): findLRs(n) ). And for n=7 we will try to speed up the program somewhat by expanding the equivalence. In addition to isotopies, we will allow the roles of columns and symbols to be swapped. Namely, for a Latin rectangle A we can always construct a Latin rectangle B such that A(i,j)=k (this notation means that at the intersection of the i-th row and the j-th column of rectangle A there is a symbol k) then only when B(i,k)=j. Such rectangles are not necessarily isotopic, and if we did not combine them into one equivalence class before, now we will. This will reduce the number of equivalence classes and, hopefully, the total search time. We need to make the following two changes (you can simply uncomment the lines marked “needed later” in two places). First, the penultimate line of the function definition rectangleToGraph replace it with

    E += [ [U[i],U[j]] for U in [Vc,Vs] \
                      for j in range(len(U)) for i in range(j) ]
    P = [V,Vr,Vc+Vs] # разбиение множества вершин

Our new equivalence transformation swaps the sets of column vertices and character vertices, so we combine them in the partition P. On the other hand, we need to avoid transformations that mix up these two sets. This is why we add additional edges to the list of edges E, making each of these sets a clique. The second change is when calculating the power of a class through the number of autotopies at the end of the function definition nonIsotopic. Now we have twice as many equivalence transformations as autotopies, so we will insert this two into the formula:

        cnt0 = factorial(n)**2 * factorial(m) * 2 / aut

Let's launch.

n: 7 m: 2 classes: 4 total: 9344160
n: 7 m: 3 classes: 45 total: 5411750400
n: 7 m: 4 classes: 808 total: 782137036800
n: 7 m: 5 classes: 3712 total: 20449013760000
n: 7 m: 6 classes: 1895 total: 61479419904000
n: 7 m: 7 classes: 324 total: 61479419904000

Hooray. The program manages to complete the calculations a little faster than 2 minutes of the allotted time. The number of equivalence classes now does not coincide with the number of isotopy classes (we have a different equivalence), but the number of all Latin rectangles is correct. If desired, by carrying out a simple check, you can also calculate the number of isotopic classes.

Conclusions and Concluding Remarks

Using 7 by 7 Latin squares as an example, we examined one of the general principles of classification of combinatorial objects. To apply a similar algorithm for some other class, you need, first, to break the classification into several steps, at each of which partial objects of increasingly larger sizes are classified. Secondly, it is desirable to learn how to effectively represent objects by graphs, so that equivalent objects correspond to isomorphic graphs (analogous to Theorem 1 above). Thirdly, it is good when the generation of objects can be reduced to one of the well-known problems for which tools have been developed, often this is the search for cliques in a graph or the problem of exact coverage (sometimes with multiplicities greater than 1). Finally, don't neglect checking; if it is possible to calculate the number of all solutions in different ways.

When it really comes down to computing resources, you have to switch to low-level programming languages, and then, if necessary, try to find the key to a specific task. A couple of universal recipes: 1. To save space on storing canonical graphs, you can convert them back into Latin rectangles (or whatever we classify there) and store canonical Latin rectangles. 2. It is often useful to classify separately objects that contain some subconfiguration and objects that avoid it. For example, for Latin rectangles, such a subconfiguration could be a 2 by 2 subsquare containing only two characters. 3. Sometimes a large number of intermediate solutions have no continuation. It can be useful to try to cut off all or most of them in some faster way than the solver that we use to find all continuations in the general case (for example, you can experiment with linear programming problem solvers, MILP).

When writing programs in C/C++ I use the following libraries:

Of course, there are other approaches to classification. For those who want to delve deeper into the topic, I recommend the monograph

In conclusion, a couple of notes about Sage for “beginners”. Regular Python code may not work in Sage due to some differences (however, Python code can be imported correctly into Sage). The most noticeable differences: First, the symbol ^ stands for exponentiationbut not exclusive or; exclusive or – ^^. Secondly, integer constants are treated as class objects sage.rings.integer.Integer, for which operations are defined in their own way and take much longer to complete than for int. To denote a type constant intyou can use the suffix rFor example 100500r.

If you work on your own computer, it is better to install the Sage package bliss – with it it will be much faster to build a canonical graph (without this package the example with Latin squares can take hours, not minutes). If Sage is installed in a separate folder (recommended) and not from the repository, then in this folder type ./sage -i bliss. Distributions and instructions can be found at https://www.sagemath.org/