DEPARTMENT OF POLITICAL SCIENCE

AND

INTERNATIONAL RELATIONS

Posc/Uapp 815



Time Series and Distributions







  1. CLASS 11 AGENDA:
    1. Time series
      1. Semi-logarithmic plots
      2. Intervention analysis and the interpretation of time series.
    2. Probability distributions
      1. The normal distribution.
    3. Reading:
      1. Agresti and Finlay, Statistical Methods, Chapter 4 pages 80 to 94.


  2. PLOTTING TIME SERIES TO EXAMINE CHANGE:
    1. Charles Murray a policy analyst and author of among many works The Bell Curve has argued for years that America's system of public welfare has unintended, indeed perverse effects. In an earlier book, Losing Ground, he claims, for instance, that the welfare state has caused increases in "unwanted" or "undesirable" behaviors and instead of helping the poor it has actually worsened their situation. Figure 1 graph, indicative of the kinds of analysis Murray uses, seems to indicate that 1) the rate of out-of-wedlock births among blacks has skyrocketed and that 2) most of the change has occurred after 1970, the era when the welfare state began to expand by leaps and bounds.
    2. Comparing changes.
      1. Using an arithmetic scale to compare rates of change, especially when the time series have different numeric values, can sometimes be misleading. Murray's figure suggests that the rate of increase among "Blacks and Others" is greater than among whites. But since the first category starts at a higher rate, it takes a larger numerical increase to match the percentage change in a series with smaller values.
        1. Figure 1 on the next page shows change using raw data. This graph is identical to the ones that appear in Losing Ground. It appears that increase in illegitimacy is higher among blacks than whites, especially after 1965.
        2. NOTE IN THESE FIGURES "B" REPRESENTS THE ILLEGITIMACY RATE AMONG BLACKS, "A" THE RATE AMONG WHITES.

    1. It has been suggested however, that one obtains a somewhat better picture of change, especially when comparing two time series that start at different places, one should first transform the dependent or series variable. Transforming each Y by taking its logarithm sometimes leads to a different interpretation.
      1. Programing: either SPSS or MINITAB facilitate transformations. Simply identify a column or variable, indicate how the numbers should be changed, and store the results in a different column with a different name.
      2. In the student version of MINITAB use Calc, then pick Mathematical expressions.
        1. In the dialogue box first type a column in which the new variable will be stored.
        2. In the expression box type loge (c1), assuming that column 1 contains the variable. Don't forget the parentheses around the column number.
          1. "Loge" means the natural logarithm.
        3. Then press OK.
      3. Follow the same general instructions for the full version of MINITAB.
      4. If you are using SPSS go to Transform menu and then select Calculate.
        1. As with MINITAB enter a name for the new variable, then in the box enter the expression ln(variable name), where the name is the label for the variable being transformed.
          1. Don't forget the parentheses and note that with SPSS the key word is ln, note loge.
    2. Figure 2 shows the result when applied to Murray's data.




      1. Interpretation: the relative steepness of a slope or curve on a semi-logarithmic graph indicates the rate of change of a variable across time.
        1. A horizontal slope indicates no change.
        2. Positive (negative) slope indicates that a variable is increasing (decreasing) over time.
        3. A straight line indicates a constant rate of change.
        4. The steeper the slope the greater the change.
        5. Two parallel lines or curves indicate the same rate of change.
      2. In this instance the rate of increase among whites appears to be greater than among blacks. That difference is obscured by the graph of the raw data because the starting places are so different.
    1. Another example: increase in neoplasms of the respiratory system.
      1. Source: Center for Health Statistics, Health United States, 1994
      2. The raw rates start from such different bases that it is difficult to compare the series.
      3. The first figure compares white men and females in the incidence of lung cancer since 1945.
      4. When looking at the raw rates it appears that the rate for men skyrocketed after 1960 and then began to taper off about 1980. The increase for women seems more modest and also levels off after 1980.
      5. But the log rates, Figure 2, shows that actually the rate for women accelerated after 1960, more than for men, and continued upward after 1980.
    2. Finally, another example.
      1. Intervention analysis: frequently an investigator wants to know if there has been a "change" in a time series that can be attributed to an "intervention" such as the enactment of a law.
        1. In the early 1980s, for instance, Congress and the president began pressing for tougher sentences for people convicted of violations of drug laws. What was the result of this get tough policy?
          1. The Sentencing Reform Act of 1984 established a U.S. Sentencing Commission, which made its first report and recommendations to Congress in 1987.
          2. Number of Convictions
          3. Log number of convictions
          4. Number sentence to 60 more or months
        2. We see from the figures that imprisonment and number serving long sentences increased dramatically in the 80s, but the increase apparently began before 1984.
    3. Interpreting time series data.
      1. The causes (and even existence) of trends are not self-evident. Intervention analysis can throw some light on possible causes, but it in no way can "prove" causality.
      2. Moreover, it is easy to attribute a change in a series to some cause. But keep in mind that many things are changing over time. So one factor may or may not be responsible for the increase or decrease.
      3. This issue crops up in discussions of trends in crime rates. Lately the incidence of both violent and non-violent crimes seem to be decreasing in many areas of the country. Politicians of course like to take credit for the declines by citing laws and regulations they have purportedly put on the books. But here is a possible alternative explanation: as the crime rate has decreased in the 1990s, so too has the number of young males as a portion of the population. So perhaps that explains the decrease. Or perhaps it (the decline) can be traced to improvements in the economy.
      4. The statistical aspects of these questions are taken up next semester.


  1. PROBABILITY DISTRIBUTIONS AGAIN:
    1. Suppose that we somehow know (we've asked everyone in a survey, for instance) that 15 percent of the registered voters in a community support President Clinton's stand on trade policy (that is, they favor expanding and liberalizing trade agreements).
    2. Now suppose we ask some one to poll 7 registered voters and ask them whether they agree with Clinton's position. This sample shows that 2 out of the 7 do
    3. A probability distribution can help us answer the following type of question:
      1. If the "true" or population percent (proportion) is 15% (or .15), how likely is it that a survey of 7 respondents would turn up 2 or fewer individuals who agree with Clinton?
      2. The distribution can help us answer the question because it shows for each possible sample outcome (i.e., 0 respondents agree, 1 respondent agrees, 2 respondents agree, and so forth) the probability of obtaining that result, given the true value is .15
      3. Here is a listing of all possible outcomes in a sample of 8 and the associated probabilities of each of those outcomes:

      1. We can see, for example, that under the supposition or hypothesis that the true proportion is .15, it is highly likely that 2 or fewer people would agree with Clinton's.
    1. Again, a probability distribution associates the possible outcomes of an "experiment" or sample with the probability of occurrence under a particular hypothesis.
      1. More precisely, a probability distribution is a function: an input value (a score on a variable, for example) is translated into a probability.

      1. Sometimes all of these scores and probabilities can be shown as a tabulated distribution such as in the previous box.
      2. We can also interpret probability distributions graphically as below.


  1. NORMAL DISTRIBUTION:
    1. A normal distribution is just one of many kinds of probability distributions. The normal distribution is specified by an equation or function that contains two constants or parameters, the mean, m, and standard deviation, s.
      1. The normal distribution function looks like this:

        1. The graph of this equation is a bell shaped curve with midpoint or mean at m.
      1. Here are some points that may help simplify the formula.
        1. e and p are just positive constants or numbers.
        2. The working part of this equation is

        1. The more Y differs from the mean, the larger the numerator in this expression.
        2. Since the numerator is squared, Y values equal distant from m but in opposite directions will lead to the same value for the exponent.
          1. That is, if the mean is 50, then Y scores of 40 and 60 will produce the same value for
          2. Hence the distribution is symmetric.
        3. Moreover, the exponent has a negative sign, which means that the larger the Y relative to the mean, the smaller the value of the function. This fact indicates roughly speaking that values far from the mean (in either direction) will have lower probability of occurrence.
        4. It can also be seen from the formula that if the exponent of e is zero, the function reduces to

          1. This is largest value the function can take and implies that the distribution will be unimodal; that is, it will have one and only one maximum value.
      1. Note that the exact shape of the normal distribution depends on the values of the parameters m and s2. So it is a family of distributions.
    1. In general: the graph of the function is bell shaped and symmetric (i.e., one half looks just like the other half): the mean, , divides the distribution into two equal sized portions.
    2. Here is a picture or graph of the equation of the normal distribution:

      1. The curve shown here is really a graph for the equation for the normal distribution--the particulars of the equation are not important so we will not consider them at this time.
    1. Properties:
      1. Notice that the curve depends on two parameters, the mean () and the standard deviation (). The normal distribution is in a sense a family of curves, each depending on a particular mean and standard deviation.
      2. The distribution is symmetric and bell shaped
      3. The mean, , marks the "center" of the distribution.
      4. The total area under the graph of the normal distribution equation is usually interpreted as equaling 100% or 1.0, depending on whether one wants to talk about a percent of an area under the curve or a proportion of the area under the curve.
      5. Areas under the curve can also be interpreted as probabilities.
    2. The area within plus one and minus one standard deviation of the mean constitutes about 68 percent of the area under the curve:

      1. The area within plus and minus two standard deviations of the mean constitutes about 95 percent of the area under the curve:

      1. Hence, one can interpret the value of the standard deviation by reference to the normal curve. If a variable is distributed normally, then approximately two thirds of the population will lie (i.e., have scores) within plus or minus one standard deviation of the mean; about 95 percent will be within plus or minus 2 standard deviations of the mean. To see what this mean use MINITAB to calculate the mean and standard deviation of a normally distributed variable (use the stem command to see if the variable approximates a normal distribution). Then add and subtract 1 standard deviation to the mean. About two thirds of the cases should lie between these numbers.


  1. USING AREAS UNDER THE CURVE TO ANSWER QUESTIONS:
    1. The last point raise interesting and important questions.
      1. How much of the area lies above three standard deviations from the mean?
      2. How much lies between 0 and +1.5 standard deviations?
      3. How many standard deviations above the mean marks the point which encompasses 49 percent of the area?
    2. Here is a different question:
      1. Suppose one believes that a variable Y has a normal distribution with a mean of 0 and a standard deviation of 10. An observation is drawn at random from a population of Y values and its score turns out to be 50. Does this observation (Y = 50) cast any doubt on the supposition. In other words, does getting a value of 50 from a population that supposedly has a mean value of 0 surprise you?
    3. We can't answer these questions without more information.


  2. NEXT TIME:
    1. The standard normal distribution

Go to Statistics main page

Go to H. T. Reynolds page.

Copyright © 1997 H. T. Reynolds