Multiple piecewise constant regression
An algorithm for constructing a piecewise constant dependence of the variable from the weighted sum
minimizing the sum of squared deviations
from the average values on the ranges of the value
.

Motivation
The problem of restoring the dependence of a certain value on a number of features is common in data analysis. Piecewise constant regression will be useful if it is required to identify the characteristic ranges of changes in the integral feature (weighted sum of features) in order to consider separately the points belonging to these ranges. Thus, having received information about the distribution of points within each of them and examining the dependence on other factors, we can draw conclusions about the segmented data.
Formal staging
Input data – set of points and number of ranges
.
Define the objective function
depending on the ranges of change in the value where in
-th range includes points with indices from the set
And
is the arithmetic mean of the values
Where
.
It is required to optimally split the points
into ranges and find a partition such that
minimum.
It is required to find such weight coefficients
for which the indicated minimum, found for given
will be the smallest.
Algorithm
The first problem is solved by the dynamic programming method. Let’s order the points ascending coordinates
. Select subtasks with parameters
– split points with indices from
before
on
ranges. Knowing the solution to the subproblem
trying to improve the solution of subproblems
For
filling in the matrices
– minimum objective function,
– the size of the last range in the optimal partition. Finally
will be equal to the minimum of the objective function, and the partition itself can be restored from the values
.
The second problem is discrete, and since the value of the objective function depends only on the order of the points then depending on the arguments
the objective function is piecewise constant, so the standard non-linear optimizers are not applicable here.
For selection of coefficients the following iterative method is proposed. The coefficients are chosen as the initial approximation
calculated from multiple linear regression. Then, at each iteration of the algorithm, we will calculate the optimal partition for the found weight coefficients, after which we fix the sizes of the partition ranges and optimize the weight coefficients for the given range sizes.
Thus, we come to the next subtask:
It is required to find the coefficients
for which the objective function will be minimal among all partitions into ranges of a given size.
This problem is proposed to be solved by the fastest coordinate-wise descent in each variable . So let’s define the functions
. Need to find this
at which the value of the objective function
(given range sizes) computed for points
will be the smallest.
The last problem belongs to computational geometry and can be solved in different ways. The naive method is to enumerate all the intersection points of the function graphs And
and calculating for each point the value of the objective function
followed by selection of the best
. The complexity of the algorithm will be
.
The other way is similar to the convex hull gift wrapping algorithm. This method plots the order statistics of the sets ascending
. We are interested in the whole set of events when, with a gradual increase
the contents of the ranges will change due to the permutation of a point on the border of the ranges with a point from the neighboring range. Since there are no more such events
then the complexity of the algorithm will be
.
And finally, it is possible, as in Graham’s algorithm, to arrange all points in ascending order and add points one by one, each time updating the solution to the subproblem of plotting order statistics depending on
for the points already considered. Because the
order statistics can be constructed in time
then finding the minimum of the objective function will take the same time.
Thus, to optimize the objective function value for given sizes of ranges by coefficients
we will optimize in turn the coefficient in each term in the set of points
taking the remaining terms in this sum as a free term
.