# Computing the weighted h-index

A common algorithm problem is that given a sequence of numbers, find a h-index. Where h-index is the largest integer $h$ such there are at least $h$ integers in the sequence is at least as large as $h$.

Formally, we have the following problem.

Given $a_{1},…,a_{n}$, find the largest $h$, such that $∣{i∣a_{i}≥h}∣≥h$.

The h-index problem is featured in leetcode.

If we the numbers are sorted, then a trivial $O(n)$ time algorithm exists. If it is not sorted, then note that we can solve the problem on $min(a_{1},n),…,min(a_{n},n)$. In this case, the input numbers is at most $n$, therefore can be sorted in $O(n)$ time. Hence the total running time is $O(n)$.

Consider a weighted version of the problem where the above algorithm does not work.

Given a sequence of pairs of non-negative positive reals $(w_{1},a_{1}),…,(w_{n},a_{n})$. Find the largest $h∈R$, such that $∑_{i:a_{i}≥h}w_{i}≥h$.

An $O(n)$ time algorithm still exists. For simplicity, we assume all $a_{i}$’s are distinct, so the input is a set. The case where $a_{i}$’s are not distinct is left as an exercise to the reader.

Define $f(t)=∑_{i:a_{i}≥t}w_{i}$. We want to find the largest $t$ such that $f(t)≥t$. First, we can find the median of $a_{1},…,a_{n}$, say $t$. If $f(t)<t$, then we recurse on ${(w_{i}−f(t),a_{i})∣a_{i}<t}$. Assume the optimum in the recursed solution is $t_{′}$, we return $t_{′}+f(t)$ as the solution. If $f(t)≥t$, then we recurse and output the solution with input ${(w_{i},a_{i})∣a_{i}≥t}$. The running time satisfies $T(n)=T(n/2)+O(n)$, which is $O(n)$.