SELECT( k, S)
if |S| = 1 then return a in S
choose random a in S
Let S1, S2, S3 be sets of elts in S (<,=,> to a)
if |S1| ≥ k then return SELECT(k,S1)
else if |S1| + |S2| ≥ k then return a
else return SELECT( k-|S1|-|S2|, S3)
Let a be the ith smallest. Then,
i > k → call SELECT on |S'| = i-1
i < k → call SELECT on |S'| = n-i
The expected cost of the recursion is
(1/n) [ ∑i =1 to k-1T(n-i) + ∑i =k+1 to nT(i-1) ]
= (1/n) [ ∑i =n-k+1 to n-1T(i) + ∑i =k to n-1T(i) ]
The rest of the procedure requires < cn time.
Therefore, for n ≥ 2, T(n) ≤ cn +
Maxk{
(1/n) [
∑i =n-k+1 to n-1T(i)
+ ∑i =k to n-1T(i)
] }
It is easy to show (by induction) that if T(1) ≤ c then, for all n ≥ 2, T(n) ≤ 4cn.
Basis: n = 1. T(1) ≤ c ≤ 4cn.
Inductive step: We need to prove that T(n) ≤ 4cn (if n > 1), assuming that, for all k < n, T(k) ≤ 4ck.
T(n) ≤ cn + Maxk{
(1/n) [
∑i =n-k+1 to n-1T(i)
+ ∑i =k to n-1T(i)
] }
≤ cn + (4c/n)
Maxk{
∑i =n-k+1 to n-1( i )
+ ∑i =k to n-1( i )
}
≤ cn + (4c/n)
Maxk{
n2/2 - 3n/2 + k(n-k+1)
}
( which is maximized at k = (n+1)/2 )
≤ cn + (4c/n)
( ¾ n2 - n + ¼ )
≤ cn +
3cn + ( -4c + c/n )
(the parenthesized expression is < 0 since n ≥ 2 )
≤ 4cn
Lemma Need on average 2 splits before getting a lucky split.
Proof We always need at least 1 split.
Half the time we're done, and half the time we must continue.
Thus, E = 1 + ½E, and therefore E = 2.
So, after 2 splits on average, the subproblem is at most ¾ the original size.
T(n) ≤ T(¾ n) + avg time to reduce problem size to at most ¾ n
T(n) ≤ T(¾ n) + 2cn
T(n) ≤ 8cn satisfies this constraint.
PICK( a, S)
Divide S into floor( |S|/5 ) sets of 5 elements each
plus 1 "leftover" set having between 0 and 4 elements
Sort each 5-element set
Let M be the set of medians of the 5-element sets
a := SELECT( ceil( |M|/2 ), M) i.e., the median of M
Also, if |S| < 50 then SELECT will sort S and return the kth smallest (in the previous algorithm, this occurred only if |S| = 1)
The method of picking a ensures that more than ¼ of the "in-play" elements of S are ≤ a and also that more than ¼ of the "in-play" elements of S are ≥ a.
More formally, since |M| = floor( n/5 ), at least floor( n/10 ) of the elements of M are ≥ a and, for each of these, there are 2 additional distinct elements of S that are at least as large. Therefore, |S1| < n - 3 floor(n/10) which, for n ≥ 50, is < 3n/4. The same can be shown for S3.
And so, each of T(|S1|) and T(|S3|) is ≤ T(3n/4).
Therefore, there exists a constant c such that
T(n) ≤ cn, for n < 50 and
T(n) ≤ T(n/5) + T(3n/4) + cn, for n ≥ 50.
It is easy to show by induction that T(n) ≤ 20cn = O(n).
Proof by strong induction on n.
Basis: for n < 50, T(n) ≤ cn ≤ 20cn.
Inductive step: We need to prove that T(n) ≤ 20cn (if n ≥ 50), assuming that, for all k < n, T(k) ≤ 20cn.
T(n) ≤
T(n/5) + T(3n/4) + cn (given)
≤ 20c(n/5) + 20c(3n/4) + cn
≤ 4cn + 15cn + cn
≤ 20cn