# Multiple piecewise constant regression

An algorithm for constructing a piecewise constant dependence of the variable from the weighted sum minimizing the sum of squared deviations from the average values ​​on the ranges of the value .

## Motivation

The problem of restoring the dependence of a certain value on a number of features is common in data analysis. Piecewise constant regression will be useful if it is required to identify the characteristic ranges of changes in the integral feature (weighted sum of features) in order to consider separately the points belonging to these ranges. Thus, having received information about the distribution of points within each of them and examining the dependence on other factors, we can draw conclusions about the segmented data.

## Formal staging

Input data – set of points and number of ranges .

Define the objective function

depending on the ranges of change in the value where in -th range includes points with indices from the set And is the arithmetic mean of the values Where .

1. It is required to optimally split the points into ranges and find a partition such that minimum.

2. It is required to find such weight coefficients for which the indicated minimum, found for given will be the smallest.

## Algorithm

The first problem is solved by the dynamic programming method. Let’s order the points ascending coordinates . Select subtasks with parameters – split points with indices from before on ranges. Knowing the solution to the subproblem trying to improve the solution of subproblems For filling in the matrices – minimum objective function, – the size of the last range in the optimal partition. Finally will be equal to the minimum of the objective function, and the partition itself can be restored from the values .

The second problem is discrete, and since the value of the objective function depends only on the order of the points then depending on the arguments the objective function is piecewise constant, so the standard non-linear optimizers are not applicable here.

For selection of coefficients the following iterative method is proposed. The coefficients are chosen as the initial approximation calculated from multiple linear regression. Then, at each iteration of the algorithm, we will calculate the optimal partition for the found weight coefficients, after which we fix the sizes of the partition ranges and optimize the weight coefficients for the given range sizes.

Thus, we come to the next subtask:

1. It is required to find the coefficients for which the objective function will be minimal among all partitions into ranges of a given size.

This problem is proposed to be solved by the fastest coordinate-wise descent in each variable . So let’s define the functions . Need to find this at which the value of the objective function (given range sizes) computed for points will be the smallest.

The last problem belongs to computational geometry and can be solved in different ways. The naive method is to enumerate all the intersection points of the function graphs And and calculating for each point the value of the objective function followed by selection of the best . The complexity of the algorithm will be .

The other way is similar to the convex hull gift wrapping algorithm. This method plots the order statistics of the sets ascending . We are interested in the whole set of events when, with a gradual increase the contents of the ranges will change due to the permutation of a point on the border of the ranges with a point from the neighboring range. Since there are no more such events then the complexity of the algorithm will be .

And finally, it is possible, as in Graham’s algorithm, to arrange all points in ascending order and add points one by one, each time updating the solution to the subproblem of plotting order statistics depending on for the points already considered. Because the order statistics can be constructed in time then finding the minimum of the objective function will take the same time.

Thus, to optimize the objective function value for given sizes of ranges by coefficients we will optimize in turn the coefficient in each term in the set of points taking the remaining terms in this sum as a free term .