Confidence Intervals for Binomial


We have been deriving confidence intervals for scenarios where we can find a pivotal quantity which involves both the data and the parameter of interest, but we can also derive confidence intervals using hypothesis tests. A confidence interval is simply the set of plausible values for a parameter, based on the observed data. Once you understand the idea of hypothesis testing, this is straight-forward, at least in principle. In the case of a confidence interval for a Binomial success probability p, you just have to find the set of values for p for which you would not reject the corresponding null hypothesis. Let's see how it goes with Darwin's data...

Graphical Construction

We need to find two p's: the smallest one for which the probability of at least 13 successes is no smaller than .025, and the largest one for which the probability of 13 or fewer successes is no smaller than .025. All p's in between those two values are choices for which we would not have rejected the corresponding null hypothesis (using a p-value cutoff value of 0.05 in order to reject the null hypothesis).

First create a data set containing the range of values for p:

            p = seq(.01,.99,.01)
Because R does arithmetic on the whole data set at once (called vectorised operations), we can compute the probabilities of interest for the entire list of values for p at once:
           L = pbinom(12,15,p,lower.tail=F)
           U = pbinom(13,15,p)
L is the list of probabilities P(X >= 13 | p). U is the list of probabilities P(X <= 13 | p). Now we plot:
           plot(p,L,ylab="Probability",type="l")
           lines(p,U)
The specification `type="l"' (that's "ell" not "one") tells the plot function to connect the successive points with line segments. The second command, "lines", adds the second curve to the existing plot.

To find a 95% confidence interval, we need to find the values of p where the two curves are at the height .025. We can do this by adding a horizontal line at that height, and then using the locator() function to pick off the corrdinates of the points of intersection.

          abline(h=.025)
To use the locator function, after entering the following command in R, click on the points you wish to select. The locator function returns both the "x" and "y" coordinates of the selected points; we only need the "x" coordinates. The following usage asks for the "x" coordinates of two points:
          locator(2)$x
You should get something in the neighborhood of .59 and .98. Grabbing the lower right corner of the plot window and expanding the window will increase the scale and thus the precision.

To produce a 90% confidence interval, simply draw the horizontal line at height .05 instead of .025.

Direct Search

We need to find two values for p: the largest plausible value, and the smallest plausible value. Again, "implausible" is defined in terms of events with total probability smaller than .05. To simplify the search, note that for p=1/2, it was sufficient that the probability of observing 13, 14, or 15 successes was less than .025 (why?). Again, we need to find two p's: the smallest one for which the probability of at least 13 successes is no smaller than .025, and the largest one for which the probability of 13 or fewer successes is no smaller than .025.

Finding the lower bound for p

A simple-minded iterative search will do the job: we already know that p=1/2 is too small.
> 1 - pbinom(12,15,1/2)
[1] 0.003692627
Try p=0.6:
> 1 - pbinom(12,15,.6)
[1] 0.027114
It's a bit too big, we want .025.
> 1 - pbinom(12,15,.59)
[1] 0.02270254
Close! The answer is somewhere between .59 and .60:
> 1 - pbinom(12,15,.595)
[1] 0.02482434
I'll leave it as an exercise for the reader to get the value any more precise than this.

Finding the upper bound for p

Now, for the upper end of the confidence interval, we need to find the largest plausible p, in other words, the largest p for which P(X <= 13) is at least .025.

Let's start with the value suggested by the graphical search:

> pbinom(13,15,.98)
[1] 0.03533831
We are close to .025, but will need to search a bit.
> pbinom(13,15,.99)
[1] 0.009629773
Hence, the upper bound is somewhere between p=.98 and p=.99. Again, I'll leave it as an exercise for the reader. When you have found it to the desired accuracy, say 3 significant digits, you will have what is called a 95% confidence interval for the unknown population proportion p.

Again - the 95% CI is just the set of null hypotheses for p that would not be rejected with p-values less than .05.

The binom.test function

The binom.test function not only tests your favorite null hypothesis, it also constructs exactly the confidence interval we have been constructing "by hand".
               binom.test(13,15)
Try it out, and compare the result to the values we found by hand above.

To produce a 90% confidence interval, use

               binom.test(13,15,conf.level=.90)
 

Next: Lab 3 Assignment
Previous: Hypothesis Tests for Binomial

Lab 3 Index