DEPARTMENT OF POLITICAL SCIENCE

AND

INTERNATIONAL AFFAIRS

Posc/Uapp 815

TIME SERIES AND STATISTICAL INFERENCE



  1. AGENDA:
    1. Intervention analysis: simple interrupted time series model
    2. Basic ideas of statistical inference.
      1. Sampling distributions
    3. Reading:
      1. Agresti and Finlay, Statistical Methods, Chapter 6, pages 154 to 167.


  2. INTERVENTION ANALYSIS:
    1. The basic problem is to develop a model that succinctly shows the effect of an "intervention."
    2. The dependent variable is a quantitative or numeric measurement taken at evenly spaced time periods such as months or years.
    3. The "intervention" is conceptualized as an "event" or "policy" that in some sense interrupts the series.
      1. See Class 21 notes for some hypothetical examples.
    4. The effects of the intervention can be modeled by counter and dummy variables.
      1. Counter: a series of integrers (1,2,3,...) that identify time periods.
    5. Dummy variables.
      1. One can create a variable or set of variables to represent categories of a non-quantitative factor.
      2. Suppose a variable has two categories such as male and female or before and after an intervention.
      3. Select one category as a reference category. The choice can be made arbitrarily but usually represents substantive interests.
      4. Example:
        1. X1 = 0 for before observations and X1 = 1 for after observations
      1. Example:
        1. X1 = 0 for females
        2. and X1 = 1 otherwise (that is, for males)

      1. Example:
        1. X1 = 0 for South (that is, southern states or counties) and X1 = 1 otherwise (that is, non-southern states or counties)
    1. Dummy and counter variables can be entered into a regression model by themselves or along with other independent variables.


  1. A MODEL FOR A CHANGE OF SLOPE OR TREND:
    1. Here are some data pertaining to turnout--the percent of the voting age population that voted--in congressional or "off-year" elections in the United States.
Year Turnout
1934 41.4
1938 44.0
1942 32.5
1946 37.1
1950 41.1
1954 41.7
1958 43.0
1962 45.4
1966 45.4
1970 43.5
1974 35.9
1978 34.9
1982 33.0
1986 33.5
1990 33.1
1994 36.0


    1. Congress passed the "Voting Rights Act" in 1965, right in the middle of this series. Its goal was to protect every citizen's right to vote. Many have interpreted it as landmark legislation that provide effective enfranchisement to African-Americans.
      1. An obvious question then is what effect, if any, did the legislation have on turnout measured as the percent of the population old enough to vote who actually did vote in congressional-year elections.
      2. In particular we are interested in changes in the turnout trend before and after passage.
      3. We know that turnout has for a variety of reasons been declining throughout the twentieth century. But one wonders if the law slowed or reversed this downward spiral.
    2. Model:
      1. Here's a simple model for measuring the law's effects on the slope or trend.


      1. It's perhaps easiest to interpret this model or equation by first looking at what is says for observations (years) before the intervention, that is, before passage of the law in 1965. For those years X2 = 0 and the model reduces to:

      1. Thus, the coefficients indicate the slope and intercept of the line before the intervention.
      2. For the post intervention years, however, the second X does not drop out and b2 measures the post-intervention change in the slope.
    1. Example:
      1. Here are the OLS estimates for the turnout data.


      1. Turnout increased very slightly before 1965 as indicated by the positive slope of .704. But after 1965 turnout declined precipitously.


  1. EXTENSIONS:
    1. Note that the previous model is overly simplistic and would have to be modified for both substantive and statistical reasons.
      1. We'll take up these extensions next semester.


  2. PROBABILITY:
    1. Here are some problems and questions:
      1. From "Rosencrantz and Guildenstern Are Dead" by Tom Stoppard. Rosencrantz (Ros) and Guildenstern (Guil) are flipping coins:


- Ros: Eighty-five in a row--beaten the record

- Guild: Don't be absurd.

- Ros: Easily

- Guil (angry): Is that it, then? Is that all?

- Ros: What?

- Guil: A new record? Is that as far as you are prepared to go?

- Ros: Well...

- Guil: No questions? Not even a pause?

- Ros: You spun them yourself.

- Guil: Not even a flicker of doubt?

- Ros (aggrieved, aggressive): Well, I won--didn't I?

- Guil:...And if you'd lost? If they had come down against you, eighty-five times, one after another, just like that?

- Ros (dumbly): Eighty-five in a row? Tails?

- Guil: Yes! What would you think?

- Ros:...Well....(Jocularly.) Well, I'd have a good look at your coins for a start!

      1. A lawyer has just called because she has been troubled by recent events at the local courthouse where welfare fraud cases are being heard. Although the proportion of females in the county is .53, she thinks that women are under-represented on the juries dealing with these cases. Last week, for example, only two out of 12 jurors were women? Is there any evidence, she wants to know, of discrimination?
      2. The board of directors of a local charity (e.g., United Way) claims to be unbiased. But a civil rights group points out to you that of 10 members only 1 is black, even though blacks make up 15% of the community. Is there any evidence of discrimination or is this composition a matter of chance? They want your advice before taking the charity to court.
      3. The administration of the University of Delaware is concerned about the dropout rate. They believe that if students saw their advisors more frequently the dropout rate would decrease and hence are considering measures that would make advisement easier to obtain. The reforms, however, will be expensive to institute. Consequently, they undertook a study in which every student was asked how often he or she saw an advisor. The average (mean) number of visits was 12. Some months later a random sample of 40 dropouts was interviewed during which time they were asked how frequently they saw an advisor. The mean number of visits for the sample of dropout students was 9.1. Is there any reason to believe that the average number of visits to advisors among the dropouts is significantly lower than the average number of visits for the student population as a whole? Is there, in other words, sufficient evidence for instituting an expensive advisement process?


    1. Review of terms and concepts
      1. Population versus samples
      2. Parameters versus statistics
      3. See the following table:

      1. Note: the notation for the estimators of the intercept (regression constant) and slope (regression coefficient) are often and .
    1. Goals of statistical inference:
      1. Test hypotheses about population parameters
      2. Estimate the magnitude or value of population parameters
    2. A simple example:
      1. Question (hypothesis): Suppose you are flipping coins with someone. A reasonable question is: is the person honest?
      2. How should one "operationalize" the notion of honesty?
      3. The question really asks: is the probability of getting a "head" on a single flip of a coin 1/2 = .5? The hypothesis suggested by the question, in other words, is that P[heads] = .5 More formally,


      1. H0 is called a null hypothesis
      2. In a series of , say, 10 flips of a coin how likely is it that a person will get 9 heads?
      3. In order to answer this question you have to decide two things:
        1. What is "likely"? That is, what is your definition of likely or probable and conversely what is your definition of improbable?
        2. Given this definition, you then have to decide what outcomes--what number of heads in 10 tosses--that you will consider probable and what number you will believe to be improbable.
      4. Stated another way you have to form a mental picture like this:

      1. Next, observe an actual set of ten tosses and count the number of heads.
      2. Finally, make a decision using the guidelines presented above: if the observed number of heads fall in the region of rejection because this result is improbable, then reject the hypothesis (P[heads] = .5) and conclude that the coin is biased, the person is crooked, etc.
      3. If, for example, someone tosses a coin 10 times and obtain 9 heads then you might suspect that the hypothesis of honesty is untenable.


  1. STEPS IN TESTING A HYPOTHESIS:
    1. Problem: A drug manufacturer claims that its anticoagulant drug is 80 percent effective (that is, the probability of preventing clots is .8) based on thousands of trials with primates, mice, and even human volunteers. The FDA has sent you data pertaining to 8 randomly selected individuals in a Veterans hospital, only 2 of whom were helped by the drug. If the "true" proportion of "success" is .8, how likely is it that you would observe only 2 out of 8 successes? Could this result have occurred by chance, thereby supporting the manufacturer's claim or is it such an improbable result that you are led to doubt the drug's efficacy?
    2. Research hypothesis: The FDA believes that the probability of success is less than .8 (that is, P < .8).
    3. The null hypothesis is: H0 = .8
      1. A null hypothesis always asserts that a population parameter equals a specific number, frequently zero but in this case .8.
    4. Sampling Distribution:
      1. Loosely speaking, a sampling distribution is a theoretical distribution that shows the relationship between the possible values of a given sample statistic (e.g., sample number of successes) based on N cases and the probability associated with each possible value.
      2. Thus, if we know a sampling distribution for a statistic, if we have a sample of N cases, if we have an actual observed value of the statistic, and if we have a particular null hypothesis in mind, we can tell how likely the observed result is. Based on this information, we can decide whether or not to accept the null hypothesis.
      3. In this case the statistic in question is simply Y, the number of successful treatments in N trials.
        1. Hence, in this case we have N = 8 and have observed Y = 2.
      4. An appropriate sampling distribution will tell us the likelihood of getting 2 or fewer successes in 8 independent trials given that the null hypothesis is true.
    5. The material that follows is designed to show where this particular sampling distribution comes from.
      1. To find the probability of Y successes in N trials given that the probability of success is P use this formula:

        1. The term P(Y,N,P) means the "probability of getting Y successes in N trials given that the probability of success is P.
        2. As we have already seen, the symbols N! and Y! mean "factorial" N! = N(N - 1)(N -2)(N -3)....For example, 8! = (8)(7)(6)...
        3. The first part of the equation represents how many ways 7 successes can be obtained in 8 trials. For example, these are the ways we can obtain 7 successes in 8 trials:


        1. In the table S denotes a success and F a failure. In the first line, 7 successes are obtained on the first 7 trials. But this is not the only way to get seven successes. The second line shows, for example. that 7 successes can be obtained by getting 6 in a row, then a failure, then another success.
      1. The number of these patterns is given by the first part of the equation; that is,

      1. The second part of the equation gives the probability of getting Y successes in N trials if the probability of success is P. This probability must be multiplied by the number of ways of getting Y successes (see above).
    1. Example: suppose the probability of getting a "success" is P = .8. In addition, suppose we want to know the probability of getting Y = 7 successes in N = 8 trials or attempts. According to the formula the probability is



      1. Note that 8! = (8)(7)(6)(5)(4)(3)(2)(1); that 7! = (7)(6)(5)(4)(3)(2)(1); and that 1! = (1). Thus, the first part of the equation reduces to 8, since 8!/7! = 8 which in turn is divided by 1. See the listing above.
      2. By letting N = 8, P = .8, (1 - P) = .2, and letting Y equal successively 0, 1, 2, 3,...we get this sampling distribution.

    1. You can get this table from MINITAB by opening the session window and typing at the MTB> prompt


    1. Parts of the Sampling distribution
      1. Decide which sample outcomes (e.g., number of successes) are so improbable to you that should one of them occur, you will not accept the null hypothesis. This is called the critical region.
      2. The cutoff value--the value at or below which you reject the null hypothesis--is called the critical value.
        1. In this case let it be 4: hence if you observe 4 or fewer successes in 8 trials you will reject the H0 that the true probability of success is .8.


  1. NEXT TIME:
    1. Dissection of sampling distribution.
    2. Tests of means and proportions.

Go to Statistics main page

Go to H. T. Reynolds page.

Copyright © 1997 H. T. Reynolds