Subset sum problem

Subset sum problem in general terms it sounds like this:

There are many S numbers. The question is whether the sum of some subset of S will be equal to a given number T.

It is known that this problem is NP-complete.

We will solve an equivalent problem where all numbers are natural numbers.

A special case of the subset sum problem is problem of partitioning a set of numbers:

The set of numbers S must be divided into two subsets S1 and S2, where the sum of S1 is equal to the sum of S2.

(The current one differs from the sum of subsets problem only in that T = Sum(S1) / 2 = Sum(S2) / 2)

I want to offer you a simple and elegant way to solve both problems relatively quickly using the method integer linear programming (TsLP). We will receive not only an exact answer to the question, but also find the desired subset.

And so, we have many S natural numbers:

S = \left\{ S_{i} \right\};  S_{i} \in \mathbb{Z};

We need to answer the question: Is it possible, using the elements of this set, to obtain a sum equal to T?

To find the answer you need to solve the linear equation:

\sum_{i=1}^{n}S_{i}*X_{i}=T;  X_{i}\in\{0,1\};\ S_{i}\in \mathbb{Z} ;T\in \mathbb{Z};

If a solution is found, then the answer to the question is positive, but if a solution is impossible, then negative. If there are several solutions, we will be satisfied with any one.

Using Python and library PuLPLet's illustrate with an example.

import pulp as pl
src = [59, 63, 15, 43, 75, 72, 61, 48, 71, 55]
n = len(src)

# Искомое значение
val = sum(src) // 2
# Объявляем модель
model = pl.LpProblem(name="Subset sum problem", sense=pl.LpMinimize)       
# Подключаем решатель CBC
solver = pl.PULP_CBC_CMD(msg=True)
# Объявляем переменные
x = [pl.LpVariable(name=f'x{i:05}', cat="Binary") for i in range(n)]    
# Целевая функция
model += pl.lpSum([x[i] for i in range(n)])    
# Основное ограничение
model += pl.lpSum([x[i] * src[i] for i in range(n)]) == val

print(model)

As a result, we obtain the following model for solving a linear equation:

Subset sum problem:
MINIMIZE
1*x00000 + 1*x00001 + 1*x00002 + 1*x00003 + 1*x00004 + 1*x00005
 + 1*x00006 + 1*x00007 + 1*x00008 + 1*x00009 + 0
SUBJECT TO
_C1: 59 x00000 + 63 x00001 + 15 x00002 + 43 x00003 + 75 x00004 + 72 x00005
 + 61 x00006 + 48 x00007 + 71 x00008 + 55 x00009 = 281
VARIABLES
0 <= x00000 <= 1 Integer
0 <= x00001 <= 1 Integer
0 <= x00002 <= 1 Integer
0 <= x00003 <= 1 Integer
0 <= x00004 <= 1 Integer
0 <= x00005 <= 1 Integer
0 <= x00006 <= 1 Integer
0 <= x00007 <= 1 Integer
0 <= x00008 <= 1 Integer
0 <= x00009 <= 1 Integer

Run the solver and output the solution

# Находим решение
status = model.solve(solver)
res = []
if status == 1:    
    for j, v in enumerate(model.variables()):
        if v.value() > 0:
            res.append(src[j])
print('Результат:', sum(res), res)

Результат: 281 [63, 75, 72, 71]

That's what we needed to get.

However, the larger the initial set and the larger the values ​​of its components, the slower the search for a solution goes.

For example, it took about twenty seconds to find a solution to the following list of 24 elements.

source = [1485498, 1643002, 1391445, 1280110, 1145269, 1195187, 1908655, 1709512, 
          1006747, 1354760, 1527205, 1486247, 1941933, 1634072, 1084740, 1350249, 
          1581194, 1981871, 1646604, 1734106, 1042882, 1763564, 1397430, 1177643]

Is it possible to somehow speed up the process of finding a solution?

It was the answer to this question that prompted me to write this post. Since the CLP problem is NP-hard, it is believed that the algorithm that finds the solution is exponential in the number of sets.

Therefore, I suggest “eating the elephant piece by piece” (breaking a complex task into simpler and less voluminous parts).

We will divide the problem into a number of tasks of less complexity by introducing additional restrictions into the model and calculating them separately. And then we will choose one of the feasible solutions, which will be considered the required one.

An additional constraint for each of the subtasks will be as follows: To find the required number T we will need exactly Y sets. For each of the subtasks we will take the value Y from the list.

Y = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]

Metaphor: for eating, it is easier to eat a loaf cut into slices than to chew it whole.

Metaphor: for eating, it is easier to eat a loaf cut into slices than to chew it whole.

Please note that, as in the loaf metaphor, the solutions at the edges will be smaller, so they are found faster. I suggest starting from the edges, because once you find a solution, you can stop searching for other possible solutions, saving time.

Y = [1, 24, 2, 23, 3, 22, 4, 21, 5, 20, 6, 19, 7, 18, 8, 17, 9, 16, 10, 15, 11, 14, 12, 13]

It should be read as follows: The sum of the values ​​of any Y elements of the set must be equal to the desired one TWhere Y – value from the list above.

By solving subproblems according to this scheme, we will find the answer faster than solving the original problem. For example, to find a solution for the series above, it will take a little more than two seconds.

The only difference from the code of the previous solution will be that we will be looking for not one solution, but several:

loop = list(range(1, n + 1))
loop = [loop.pop(0) if i % 2 == 0 else loop.pop(-1) for i in range(n)]
for val in loop:
    # Устанавливаем дополнительное ограничение шага цикла
    model.constraints['Y'] = pl.lpSum([x[j] for j in range(n)]) == val 
    print(model)
    
    # Находим решение
    status = model.solve(solver)  
    if status == 1:
        res = []
        for j, v in enumerate(model.variables()):
            if v.value() > 0:
                res.append(src[j])
            return res
return []

Thus, we will consistently build a series of models and try to find a solution.

Subset sum problem:
MINIMIZE
1*x00000 + 1*x00001 + 1*x00002 + 1*x00003 + 1*x00004 + 1*x00005 + 1*x00006
 + 1*x00007 + 1*x00008 + 1*x00009 + 1*x00010 + 1*x00011 + 1*x00012 + 1*x00013
 + 1*x00014 + 1*x00015 + 1*x00016 + 1*x00017 + 1*x00018 + 1*x00019 + 1*x00020
 + 1*x00021 + 1*x00022 + 1*x00023 + 0
SUBJECT TO
_C1: 1485498 x00000 + 1643002 x00001 + 1391445 x00002 + 1280110 x00003
 + 1145269 x00004 + 1195187 x00005 + 1908655 x00006 + 1709512 x00007
 + 1006747 x00008 + 1354760 x00009 + 1527205 x00010 + 1486247 x00011
 + 1941933 x00012 + 1634072 x00013 + 1084740 x00014 + 1350249 x00015
 + 1581194 x00016 + 1981871 x00017 + 1646604 x00018 + 1734106 x00019
 + 1042882 x00020 + 1763564 x00021 + 1397430 x00022 + 1177643 x00023
 = 17734962

Y: x00000 + x00001 + x00002 + x00003 + x00004 + x00005 + x00006 + x00007
 + x00008 + x00009 + x00010 + x00011 + x00012 + x00013 + x00014 + x00015
 + x00016 + x00017 + x00018 + x00019 + x00020 + x00021 + x00022 + x00023 = 1

VARIABLES
0 <= x00000 <= 1 Integer
0 <= x00001 <= 1 Integer
0 <= x00002 <= 1 Integer
0 <= x00003 <= 1 Integer
0 <= x00004 <= 1 Integer
0 <= x00005 <= 1 Integer
0 <= x00006 <= 1 Integer
0 <= x00007 <= 1 Integer
0 <= x00008 <= 1 Integer
0 <= x00009 <= 1 Integer
0 <= x00010 <= 1 Integer
0 <= x00011 <= 1 Integer
0 <= x00012 <= 1 Integer
0 <= x00013 <= 1 Integer
0 <= x00014 <= 1 Integer
0 <= x00015 <= 1 Integer
0 <= x00016 <= 1 Integer
0 <= x00017 <= 1 Integer
0 <= x00018 <= 1 Integer
0 <= x00019 <= 1 Integer
0 <= x00020 <= 1 Integer
0 <= x00021 <= 1 Integer
0 <= x00022 <= 1 Integer
0 <= x00023 <= 1 Integer

They will differ only in the value of the right side of the equality in line 18.

The approach described in the work, although not ideal, is quite working. Despite the fact that the problem itself is NP-hard, we can find a solution for sets from several tens to several hundred elements, depending on the initial data.

Note

It is also worth noting that not every integer linear equation solver is able to accurately solve an integer linear problem. Due to the implementation features, many solvers on the market have some calculation error, which slightly affects the final result in typical problems. But in this algorithm, since we are talking about exact solutionespecially with large input data values, the choice of solver can have a catastrophic effect on the result.

In my tests only CBC from COIN-OR Foundation and OR-Tools from Google, coped with this task correctly.

Conclusion:

Some integer linear programming problems can be solved more quickly if they are broken down into a series of simpler problems by introducing additional constraints.

Thank you for your attention!

Full code
import pulp as pl
import random as rnd
from datetime import datetime
# ==========================================================================
def ssp(src: list, val: int):
    """
    Задача о сумме подмножеств натуральных чисел (Subset sum problem - SSP)
    """
    n = len(src)
    # Объявляем модель
    model = pl.LpProblem(name="ssp", sense=pl.LpMinimize)       
    # Подключаем решатель CBC
    solver = pl.PULP_CBC_CMD(msg=True)
    # Объявляем переменные
    x = [pl.LpVariable(name=f'x{i:05}', cat="Binary") for i in range(n)]    
    # Целевая функция
    model += pl.lpSum([x[i] for i in range(n)])    
    # Основное ограничение
    model += pl.lpSum([x[i] * src[i] for i in range(n)]) == val   

    # Находим решение
    status = model.solve(solver)
    if status == 1:
        s = []
        for j, v in enumerate(model.variables()):
            if round(v.value()) > 0:
                s.append(src[j])
        return s
    return []
# ==========================================================================
def ssp_optimized(src: list, val: int):
    """
    Задача о сумме подмножеств натуральных чисел оптимизированный алгоритм
    (Subset sum problem optimized - SSP)
    """
    n = len(src)
    # Объявляем модель
    model = pl.LpProblem(name="ssp", sense=pl.LpMinimize)    
    # Подключаем решатель CBC
    solver = pl.PULP_CBC_CMD(msg=True)
    # Объявляем переменные
    x = [pl.LpVariable(name=f'x{i:05}', cat="Binary") for i in range(n)]
    # Целевая функция
    model += pl.lpSum([x[i] for i in range(n)])
    # Основное ограничение
    model += pl.lpSum([x[i] * src[i] for i in range(n)]) == val

    loop = list(range(1, n + 1))
    loop = [loop.pop(0) if i % 2 == 0 else loop.pop(-1) for i in range(n)]
    for val in loop:
        # Устанавливаем дополнительное ограничение шага цикла
        model.constraints['Y'] = pl.lpSum([x[j] for j in range(n)]) == val 

        # Находим решение
        status = model.solve(solver)  
        if status == 1:
            s = []
            for j, v in enumerate(model.variables()):
                if round(v.value()) > 0:
                    s.append(src[j])
            return s
    return []
# ==========================================================================
n = 23
rnd.seed(0)
# Генерируем случайный ряд чисел
src = [rnd.randint(1000000, 2000000) for i in range(n)]
t = sum(src) // 2
print(f'source = {src}')
print()

start_time = datetime.now()
res = ssp(src, t)
date_diff = (datetime.now() - start_time).total_seconds()
print('solution =', res)
print('time=", date_diff)
print()

start_time = datetime.now()
res = ssp_optimized(src, t)
date_diff = (datetime.now() - start_time).total_seconds()
print("solution optimized =', res)
print('time=", date_diff)

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *