Multiple piecewise constant regression

An algorithm for constructing a piecewise constant dependence of the variable y from the weighted sum x = w_1x_1 + ... + w_px_pminimizing the sum of squared deviations y from the average values ​​on the ranges of the value x.

An example of dividing points into ranges

An example of dividing points into ranges

Motivation

The problem of restoring the dependence of a certain value on a number of features is common in data analysis. Piecewise constant regression will be useful if it is required to identify the characteristic ranges of changes in the integral feature (weighted sum of features) in order to consider separately the points belonging to these ranges. Thus, having received information about the distribution of points within each of them and examining the dependence on other factors, we can draw conclusions about the segmented data.

Formal staging

Input data – set of points (x_i, y_i), i=1..n and number of ranges m.

Define the objective function

J=\sum_{k=1}^m\sum_{i\in I_k}(y_i - c_k)^2,

depending on the ranges of change in the value xwhere in k-th range includes points with indices from the set I_kAnd c_k is the arithmetic mean of the values y_iWhere i \in I_k.

  1. It is required to optimally split the points x_i into ranges and find a partition such that J minimum.

  2. It is required to find such weight coefficients w_k,k=1..pfor which the indicated minimum, found for given w_kwill be the smallest.

Algorithm

The first problem is solved by the dynamic programming method. Let’s order the points (x_i,y_i) ascending coordinates x. Select subtasks with parameters (i,k) – split points with indices from 1 before i on k ranges. Knowing the solution to the subproblem (i,k)trying to improve the solution of subproblems (i+j,k+1) For j=0..nifilling in the matrices f_{ik} – minimum objective function, r_{ik} – the size of the last range in the optimal partition. Finally f_{nm} will be equal to the minimum of the objective function, and the partition itself can be restored from the values r_{ik}.

The second problem is discrete, and since the value of the objective function depends only on the order of the points x_ithen depending on the arguments w_k the objective function is piecewise constant, so the standard non-linear optimizers are not applicable here.

For selection of coefficients w_k the following iterative method is proposed. The coefficients are chosen as the initial approximation w_kcalculated from multiple linear regression. Then, at each iteration of the algorithm, we will calculate the optimal partition for the found weight coefficients, after which we fix the sizes of the partition ranges and optimize the weight coefficients for the given range sizes.

Thus, we come to the next subtask:

  1. It is required to find the coefficients w_kfor which the objective function will be minimal among all partitions into ranges of a given size.

This problem is proposed to be solved by the fastest coordinate-wise descent in each variable w_k. So let’s define the functions z_i(w)=a_i+x_iw. Need to find this wat which the value of the objective functionJ (given range sizes) computed for points (z_i(w),y_i)will be the smallest.

The last problem belongs to computational geometry and can be solved in different ways. The naive method is to enumerate all the intersection points of the function graphs z_i(w) And z_j(w) and calculating for each point the value of the objective function J followed by selection of the best w. The complexity of the algorithm will be O(n^3).

The other way is similar to the convex hull gift wrapping algorithm. This method plots the order statistics of the sets z_i(w) ascending w. We are interested in the whole set of events when, with a gradual increase w the contents of the ranges will change due to the permutation of a point on the border of the ranges with a point from the neighboring range. Since there are no more such events O(nm)then the complexity of the algorithm will be O(n^2m).

And finally, it is possible, as in Graham’s algorithm, to arrange all points in ascending order x_i and add points one by one, each time updating the solution to the subproblem of plotting order statistics depending on w for the points already considered. Because the n order statistics can be constructed in time O(n^2)then finding the minimum of the objective function will take the same time.

Thus, to optimize the objective function value J for given sizes of ranges by coefficients w_kwe will optimize in turn the coefficient in each term in the set of points x_i=w_1x_{i1}+\ldots+w_px_{ip}taking the remaining terms in this sum as a free term a_i.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *