Selection in a sorted matrix
A matrix is sorted if every row and column are non-increasing.
A common interview problem asks for the th smallest number in a sorted matrix, and usually people give an algorithm. Wlog we can assume all the numbers in the matrix are distinct, as we can always break the ties by factoring in the position of the number in the matrix.
There is a time solution. In fact, if it’s a matrix and .
I’m frustrated that there isn’t a good description of such an algorithm. The most common reference people provide is [1]. However there are a few downsides of that paper. It only works when and it is still a bit complicated. So here I will give a short description to a modified version of the algorithm.
One can see the idea closely resemble what happens in fractional cascading, we basically squeeze the unknown values between known values, so we don’t have to look at most of the unknown values.
First, we assume the matrix we care about is a matrix . Both and are power of and .
Let be the matrix we get by removing all even index columns from , and add the last column.
In [1], they used , which is defined as only retain positions with odd coordinates, but I found it nicer to consider things by only stripping one coordinate. In particular, it gave a much nicer proof for the following result.
is the number of elements in matrix smaller or equal to .
Let be a sorted matrix, then .
For any fixed , let be the largest , such that . , . On the other hand .
This means if we want to find an element of rank in , we can first find element of rank and in , and we know the solution would be in between. The remaining operation takes time:
- Use a flood fill starting from the position of th element in and find at most positions where the element of rank could reside. Namely it select elements in that is in between the th element and th in . We can do this because all these elements are connected in a component.
- While doing the flood fill, it can also find the rank of the th element in in the matrix for no extra cost.
- A linear time selection algorithm on all the elements resulted from the flood fill.
This would give us a recursive algorithm if instead of just finding th number, it finds the th and th number at the same time. As long as , we can find them both only with a extra time. and will be the upper and lower bounds respectively. Some basic algebra shows inside each recursive step if we start with .
Let be the time used when the matrix is of size , and . Certainly , and the rest follow the recursive relation for some constant .
If , we can rotate the matrix implicitly to get the same running time.
Solving it gives us the desired running time . This is also a time algorithm because we only need to consider the submatrix in the up-left position.
A Haskell implementation here, it requires a linear time rank
selection algorithm selectRank
.
Note a slightly faster algorithm also exists when and are very different. It was shown selecting the th element can be done in time [2].