# $L_{1}$ linear regression

I read an article on the errors in visualization. The example of forcing a relationship by cherry-picking scales is delightful. I recommend reading it.

I am interested in misleading people while being completely honest. The article inspires the following problem. Given 2 vectors $x,y∈R_{n}$. Let $1$ be the all $1$ vector in $R_{n}$. We are interested in finding $a,b∈R$, such that $∥y−(ax+b1)∥_{p}$ is minimized. Here $p$ is either $1,2$ or $∞$.

Note the problem is precisely the same as the linear regression
problem. In the linear regression problem, we are given a point set
$S⊂R_{2}$ of size $n$ and we are interested in find a line $f(x)=ax+b$, such that it minimizes the
*error*, defined as

$(x,y)∈S∑ ∥y−f(x)∥_{p}$

For $p=2$, there is a $O(n)$ time algorithm because there is a closed formula. For $p=∞$, the problem can be rewritten as a linear program with $3$ variables and $n$ constraints. Using Megiddo’s result [1], there is a $O(n)$ time algorithm to solve this problem.

It is hard to find the worst case complexity when $p=1$. This case is called the *least
absolute deviations*. Statisticians just don’t care about worst case
running time as CS people do.

There are a few methods I found. One is to write it as a linear program on $n+2$ variables and $n$ constraints and solve it using the simplex method. The linear program is as follows.

$ a,b,t_{1},…,t_{n}min s.t. i=1∑n t_{i}t_{i}≥(ax_{i}+b)−y_{i}t_{i}≤y_{i}−(ax_{i}+b) ∀1≤i≤n∀1≤i≤n $

There are a bunch of other algorithms that specializes the simplex algorithm on this particular problem. There are also some iterative methods. Unfortunately, those algorithms depends on the actual numbers in the input. I want a running time that only depends on $n$.

There exists an optimal solution that contains two points in $S$. The native algorithm is to try all possible $O(n_{2})$ lines. For each line, the algorithm can compute the error in $O(n)$ time. The naive algorithm’s running time is $O(n_{3})$. There is a smarter algorithm. The optimal line that contains the point can actually be found in $O(n)$ time. Indeed, consider the line passes through the point $(x,y)$. We consider changing the slope of the line, while maintaining it still contain $(x,y)$. One can see a minimum will be reached at some line. Indeed, assume we reorder the points, so $x_{i}−xy_{i}−y ≤x_{i+1}−xy_{i+1}−y $ (namely, increasing slope). Let $k$ be the smallest integer such that the sum of $∑_{i=1}∣x_{i}−x∣≥∑_{i=k+1}∣x_{i}−x∣$. The line determined by $(x,y)$ and $(x_{k},y_{k})$ is the desired line. This can be computed in linear time by finding weighted median. Hence one can show the running time is $O(n_{2})$. This is the idea of [2]. That is all I can find through an hour of search.

After discussing with Qizheng He, he suggested the following approach. Consider the function $g_{p}(s)$ for $p∈S$. It is defined as the error for the line of slope $s$ that contains $p$. The function is bitonic, therefore we can do a ternary search to find the minimum. There are only $n−1$ possible slopes, hence the ternary search will take $O(gn)$ queries, where each query asks for the error of the line that goes through $p$ and some other point.

Given a line $f(x)=ax+b$, can one compute the error quickly? It is possible to decompose it to few halfspace range counting queries (allowing weights). In halfspace counting queries problem, we are given $n$ points with weights, we can preprocess it and obtain a data structure. Each query to a data structure is a halfspace, the output is the sum of all elements in the halfspace. In $2$D, there exists a preprocessing time $O~(n_{4/3})$ and query time $O~(n_{1/3})$ data structure [3]. Let $S_{+}$ be the set of points above $f$, and $S_{−}$ be the set of points below $f$. The result is precisely the following.

$(x,y)∈S_{+}∑ y−ax−b+(x,y)∈S_{−}∑ ax+b−y$

Let’s consider the second sum, $∑_{(x,y)∈S_{−}}ax+b−y=a∑_{(x,y)∈S_{−}}x+∣S_{−}∣b−∑_{(x,y)∈S_{−}}y$. Note the $3$ terms can each be solved with a halfspace counting query, consider all points lies below $f$. This shows in $6$ halfspace counting queries.

How can one do ternary search? This would need us to be able to pick the point that gives us the $i$th largest slope with $p$. We need a data structure such that it can return the $i$th largest point in the radial ordering of the points in $S$ around $p$. It is equivalent to halfspace range counting up to polylog factors.

Thus, the total running time after building the data structure in $O~(n_{4/3})$ is $n$ times ternary search over $n$ elements, where each decision process takes $O~(n_{1/3})$ time. Therefore the final running time is $O~(n_{4/3})$ time.

Qizheng mentioned the problem to Timothy Chan, who gave us some references. There is an easy solution that obtains $O(ng_{2}n)$ time algorithm using simple parametric search [4]. Consider the following linear program. Let $k$ be a constant. We are given $a_{1},…,a_{k},b_{1},…,b_{n}$, $k$D vectors $β_{1},…,β_{m}$ and reals $α_{1},…,α_{m}$. Sets $J_{1},…,J_{n}$ a partition of $[m]$.

$ w_{1},…,w_{k},x_{1},…,x_{n}min s.t. i=1∑k a_{i}w_{i}+i=1∑n b_{i}x_{i}x_{i}≥(d=1∑k β_{j,d}w_{d})−α_{j} ∀1≤i≤n,j∈J_{i} $

Zemel showed such linear program can be solved in $O(m)$ time for constant $k$ [5]. The idea is a similar algorithm to Megiddo’s linear time constant dimension LP algorithm [1]. For linear regression problem in $L_{1}$ with $n$ data points. The linear program we derived is a special case of the above linear program when $k=2$ and $m=O(n)$. In fact, Zemel use the same linear program to show constant dimension $L_{1}$ regression can be solved in linear time.

# 1 Open problem

One can also define another metric, the lexicographical minimum. Such idea was already present in fairness related linear regression [6]. Once we sort the values of $∣y−f(x)∣$ for $(x,y)∈S$, say obtaining $a_{1},…,a_{n}$, where $a_{1}≥a_{2}≥…≥a_{n}$. We are interested in finding a $f$ that minimizes the sequence $a_{1},…,a_{n}$, lexicographically. Can this problem be solved in $O(n)$ time?

# References

**Linear programming in linear time when the dimension is fixed**, J. ACM. 31 (1984) 114–127 10.1145/2422.322418.

**Least absolute deviations curve-fitting**, SIAM Journal on Scientific and Statistical Computing. 1 (1980) 290–301 10.1137/0901019.

**Range searching with efficient hierarchical cuttings**, Discrete & Computational Geometry. 10 (1993) 157–182 10.1007/BF02573972.

**Finding Least-Distances Lines**, SIAM Journal on Algebraic Discrete Methods. 4 (1983) 207–211 10.1137/0604021.

**An O(n) algorithm for the linear multiple choice knapsack problem and related problems**, 18 (1984) 123–128 10.1016/0020-0190(84)90014-0.