DEPARTMENT OF POLITICAL SCIENCE

AND

INTERNATIONAL AFFAIRS

Posc/Uapp 815

STATISTICAL INFERENCE

AGENDA:

Basic ideas of statistical inference.

Sampling distributions
Critical values and regions
Types of errors in making inferences

Reading:

Agresti and Finlay, Statistical Methods, Chapter 6, pages 154 to 167.

INFERENCE:

This section repeats the notes from the last class.
Goals of statistical inference:

Test hypotheses about population parameters
Estimate the magnitude or value of population parameters

A simple example:

Question (hypothesis): Suppose you are flipping coins with someone. A reasonable question is: is the person honest?
How should one "operationalize" the notion of honesty?
The question really asks: is the probability of getting a "head" on a single flip of a coin 1/2 = .5? The hypothesis suggested by the question, in other words, is that P[heads] = .5 More formally,

H₀ is called a null hypothesis
In a series of , say, 10 flips of a coin how likely is it that a person will get 9 heads?
In order to answer this question you have to decide two things:

What is "likely"? That is, what is your definition of likely or probable and conversely what is your definition of improbable?
Given this definition, you then have to decide what outcomes--what number of heads in 10 tosses--that you will consider probable and what number you will believe to be improbable.

Stated another way you have to form a mental picture like this:

Next, observe an actual set of ten tosses and count the number of heads.
Finally, make a decision using the guidelines presented above: if the observed number of heads fall in the region of rejection because this result is improbable, then reject the hypothesis (P [heads] = .5) and conclude that the coin is biased, the person is crooked, etc.
If, for example, someone tosses a coin 10 times and obtain 9 heads then you might suspect that the hypothesis of honesty is untenable.

STEPS IN TESTING A HYPOTHESIS:

Problem: A drug manufacturer claims that its anticoagulant drug is 80 percent effective (that is, the probability of preventing clots is .8) based on thousands of trials with primates, mice, and even human volunteers. The FDA has sent you data pertaining to 8 randomly selected individuals in a Veterans hospital, only 2 of whom were helped by the drug. If the "true" proportion of "success" is .8, how likely is it that you would observe only 2 out of 8 successes? Could this result have occurred by chance, thereby supporting the manufacturer's claim or is it such an improbable result that you are led to doubt the drug's efficacy?
Research hypothesis: The FDA believes that the probability of success is less than .8 (that is, P < .8).
The null hypothesis is: H₀ = .8

A null hypothesis always asserts that a population parameter equals a specific number, frequently zero but in this case .8.

Sampling Distribution:

Loosely speaking, a sampling distribution is a theoretical distribution that shows the relationship between the possible values of a given sample statistic (e.g., sample number of successes) based on N cases and the probability associated with each possible value.
Thus, if we know a sampling distribution for a statistic, if we have a sample of N cases, if we have an actual observed value of the statistic, and if we have a particular null hypothesis in mind, we can tell how likely the observed result is. Based on this information, we can decide whether or not to accept the null hypothesis.
In this case the statistic in question is simply Y, the number of successful treatments in N trials.

Hence, in this case we have N = 8 and have observed Y = 2.

An appropriate sampling distribution will tell us the likelihood of getting 2 or fewer successes in 8 independent trials given that the null hypothesis is true.

The material that follows is designed to show where this particular sampling distribution comes from.

To find the probability of Y successes in N trials given that the probability of success is P use this formula:

The term P_(Y,N,P) means the "probability of getting Y successes in N trials given that the probability of success is P.
As we have already seen, the symbols N! and Y! mean "factorial" N! = N(N - 1)(N -2)(N -3)....For example, 8! = (8)(7)(6)...
The first part of the equation represents how many ways Y successes can be obtained in 8 trials. For example, these are the ways we can obtain 7 successes in 8 trials:

In the table S denotes a success and F a failure. In the first line, 7 successes are obtained on the first 7 trials. But this is not the only way to get seven successes. The second line shows, for example. that 7 successes can be obtained by getting 6 in a row, then a failure, then another success.

The number of these patterns is given by the first part of the equation; that is,

The second part of the equation gives the probability of getting Y successes in N trials if the probability of success is P. This probability must be multiplied by the number of ways of getting Y successes (see above).

Example: suppose the probability of getting a "success" is P = .8. In addition, suppose we want to know the probability of getting Y = 7 successes in N = 8 trials or attempts. According to the formula the probability is

Note that 8! = (8)(7)(6)(5)(4)(3)(2)(1); that 7! = (7)(6)(5)(4)(3)(2)(1); and that 1! = (1). Thus, the first part of the equation reduces to 8, since 8!/7! = 8 which in turn is divided by 1. See the listing above.
By letting N = 8, P = .8, (1 - P) = .2, and letting Y equal successively 0, 1, 2, 3,...we get this sampling distribution.

You can get this table from MINITAB by opening the session window and typing at the MTB> prompt

Parts of the Sampling distribution

Decide which sample outcomes (e.g., number of successes) are so improbable to you that should one of them occur, you will not accept the null hypothesis. This is called the critical region.
The cutoff value--the value at or below which you reject the null hypothesis--is called the critical value.

In this case let it be 4: hence if you observe 4 or fewer successes in 8 trials you will reject the H₀ that the true probability of success is .8.

Now also realize that whenever you make a judgment of this sort you run the risk of making an error because you are, in effect, saying that a result is "improbable" but not "impossible." There is therefore a probability, however small you wish to make it, that you have made an incorrect decision. Why? Because the critical region contains values that could occur but are so unlikely that you believe that they did not arise under the null hypothesis.

This kind of a mistake is called a Type I error and the probability of making it is the probability of falling in the critical region.
This probability, usually denoted by ,is also called the level of significance.
Here is the sampling distribution again. Let us define the values of Y 0, 1, 2, 3, and 4 as the critical region. (See the figure below.)

To summarize:

Critical region: those possible sample results that will cause us to reject H₀ if one should actually occur.
Critical value: if an observed sample value exceeds this we will reject H₀. The critical value is thus the demarcation point between the critical region and the region of acceptability.
Type I error: rejecting the H₀ when it is true.
The level of significance is another term for the probability of making a Type I error.

Collect data and compute sample statistic:

Here the job is particularly easy: Y is 2 out of 8.

Decision:

Compare the sample statistic to the critical value: if it is greater than the critical value (in this case) accept the null hypothesis; if is equal to or less than (in this case) the critical value reject H₀.

Interpretation:

Here we reject the null hypothesis because the observed Y of 2 is less than 4, the critical value.
There is a small chance that we have incorrectly rejected H₀ because the probability of observing 2 or fewer successes given that P equals .8 is the sum of the probabilities associated with 0, 1 and 2 successes. It's not a large chance but it is a chance.
If we falsely reject the null hypothesis, we are in effect telling the FDA to ban the drug when in fact it might help many people. Given that knowledge we might want to make the level of significance even lower--that is we might want to minimize the chance of Type I error even more.

ANOTHER EXAMPLE OF THE BINOMIAL:

Problem: a population of available jurors in a city contains 53 percent of the population who do not favor capital punishment. This is known from a detailed public opinion survey. A local judge is suspected of approving only potential jurors who favor the death penalty. The last jury consisting of 12 people which dealt with a murder case contained only 5 people who said they opposed capital punishment. Is there any evidence of selective bias?
Hypotheses:

The "research" hypothesis is that the judge is biased which means that the proportion of "anti-capital punishment jurors" will be less than .53. Thus, the H_A (H_A is the alternative hypothesis.)
The null hypothesis is P =.53.

The H₀ asserts that a population parameter equals a specific value.

Since the alternative hypothesis (H_A) is that P is less than .53 we will consider only those sample results at the "low" end of the scale as disconfirming evidence. This is a one-tailed test of significance. (See below.)
Sampling distribution and critical region:

Let us decide ahead of time that we will consider any sample result that occurs with probability less than .02 as evidence that the null hypothesis should not be accepted.
The sampling distribution and critical region and critical values for this problem are shown in the figure on the next page. Note that N = 12 and P = .53:

Given the nature of the problem, we will reject the null hypothesis only if we get, say, 0 or 1 or 2 "anti-capital punishment" jurors; that is, we will use only one tail of the sampling distribution to test the hypothesis.
Sometimes when there is no clear alternative hypothesis we will consider unlikely events at both ends of the distribution.
Critical value: we have agreed ahead of time to reject the H₀ if a sample result occurs with probability of .02 or less. Thus, from the above distribution we see that the critical region includes outcomes 0, 1, and 2. The outcome 3 occurs with probability .0367 and is therefore above the critical value.
Hence the decision rule is: reject H₀ if and only if Y, the number of "successes," is 2 or less.

The level of significance is therefore .02.

Sample result: the data indicate that Y = 5 jurors oppose capital punishment. The sample result is thus 5 which we compare with the critical value.
Decision: since 5 is greater than 2 (the critical value) we do not reject the null hypothesis.
Interpretation: We have concluded that the judge is not biased against jurors who oppose the death penalty. There is a chance that we have made a mistake, but this time the possible error is in failing to reject a null hypothesis that should be rejected.

TYPES OF ERRORS IN HYPOTHESIS TESTING:

Here are the two types of errors one can make in statistical inference:

Type I: Rejecting a null hypothesis that is really true.

The probability of making this type of error, under the assumptions and conditions mentioned above, is a.
One can interpret in various ways:

It is the "size" of the critical region. If the null hypothesis, H₀, is really true, then a critical region of size is the probability that a sample result will lead to an incorrect decision. That is, the sample result, since it falls in the critical region, will suggest that we reject H₀, when it should not be.
a is called the level of significance. If, for example, we use an a of .05, and the sample result falls in the critical region, we say that the result is "significant at the .05 level." If a is .001, and the sample result still falls in the critical region we say it is "significant at the .001 level."
Keep in mind, however, that is the probability of making a type I error, which is to incorrectly or falsely reject a "true" null hypothesis.

Type II: Failing to reject a hypothesis that is really false.

The probability of incorrectly accepting a false H₀ is b. It is not exactly the opposite of since its size depends on how false the null hypothesis is.
The power of a test: the power of a statistical test is defined as 1 - b. It tells one the chance of detecting a false H₀.

NEXT TIME:

Tests of means and proportions.

Go to Statistics main page

Go to H. T. Reynolds page.