Public Management Statistics Class 12 Notes

DEPARTMENT OF POLITICAL SCIENCE
/strong>
AND

INTERNATIONAL RELATIONS

Posc/Uapp 815

The Normal Distribution

AGENDA:

The normal distribution.

Interpretation of the normal distribution function.

Area under the normal curve

The standard normal distribution

THE NORMAL PROBABILITY DISTRIBUTION:

See the notes from the previous class.

THE NORMAL DISTRIBUTION:

One can think of the area under a normal "curve" as equaling 100% or 1.0, depending on whether one wants to talk about a percent of an area under the curve or a proportion of the area under the curve.

Areas under the curve can also be interpreted as probabilities.

The area within plus one and minus one standard deviation of the mean constitutes about 68 percent of the area under the curve:

The area within plus and minus two standard deviations of the mean constitutes about 95 percent of the area under the curve (see Figure 2):

Hence, one can interpret the value of the standard deviation by reference to the normal curve. If a variable is distributed normally, then approximately two thirds of the population will lie (i.e., have scores) within plus or minus one standard deviation of the mean; about 95 percent will be within plus or minus 2 standard deviations of the mean. To see what this mean use MINITAB to calculate the mean and standard deviation of a normally distributed variable (use the stem command to see if the variable approximates a normal distribution). Then add and subtract 1 standard deviation to the mean. About two thirds of the cases should lie between these numbers.

USING AREAS UNDER THE CURVE TO ANSWER QUESTIONS:

The last point raise interesting and important questions.

How much of the area lies above three standard deviations from the mean?

How much lies between 0 and +1.5 standard deviations?

How many standard deviations above the mean marks the point which encompasses 49 percent of the area?

Here is a different question:

Suppose one believes that a variable Y has a normal distribution with a mean of 0 and a standard deviation of 10. An observation is drawn at random from a population of Y values and its score turns out to be 50. Does this observation (Y = 50) cast any doubt on the supposition. In other words, does getting a value of 50 from a population that supposedly has a mean value of 0 surprise you?

We can't answer these questions without more information.

THE STANDARD NORMAL DISTRIBUTION:

Since the normal curve is really a family of curves, each depending on a specific mean and standard deviation, statisticians use a single distribution, the standard normal distribution, to answer these types of questions.

Areas under the standard normal distribution have been tabulated and presented in the form of a so-called standard normal or z-table.

A standard normal distribution has a mean of 0 and standard deviation of 1. (That is, = 0 and = 1.) Areas under this curve--that is, the curve of the normal distribution with mean 0 and standard deviation of 1--have been extensively tabulated and appear in every statistics text.

The tables are called z tables. One is attached.

A z-table shows the percentage of the area under the curve--that is the percentage of the distribution--that lies between the mean and any specified number of standard deviations from the mean.

Example:

In a standard normal distribution how much area lies to the left of one standard deviation above the mean? Look in the column marked "z" and find 1.0. In the column labeled "second decimal place" find .00. The entry in the corresponding row and column is ..1587 This means that 15.87 percent of the area under the curve--15.87 percent of the distribution--lies below 1.0 standard deviation.

Since areas under a normal curve can be interpreted as probabilities or frequencies, we can think of .1587 as saying: "if a variable has a standard normal distribution, the proportion or of cases that will have values below one standard deviation from the mean (0) is .1587." Or, "the probability that a case lies 1 or more standard deviations above the mean is .1587."

Since the distribution is symmetric, the value .1587 also suggests that the proportion of cases lying below -1.0 is .1587

Figure 3 provides a picture:

Look now in the z-value columns to find 1.96. (Use the 1.9 row and .06 column.)The corresponding area is .0250 which indicates that 2.5 percent of the distribution lies 1.96 standard deviations above the mean. (Also 2.5 percent lies below -1.96 deviations.)

The total area lying beyond +1.96 and -1.96 is:

Thus, 5 percent of the area is beyond plus and minus 1.96 or |1.96| standard deviations standard normal distribution.

Look in the body of the table and find .0052. What standard deviation corresponds to this percent of the area? The answer, found by looking at the corresponding z columns, is 2.56. The usual interpretation applies: by going 2.56 standard deviations above (or below) the mean we define .5 percent of the area of the normal curve.

The total area beyond plus or minus 2.56 standard deviations is thus:

Figure 5 illustrates the point.

Using pictures and logic like this we can find the area corresponding to any two points along the z scale.

Example:

What is the area below -1.0 and above +2.56 standard deviations of a standard normal curve?

First draw a picture (see Figure 6):

We know from the table that .1587 proportion of the total area lies beyond -1.0. Furthermore, we know than .5 proportion of the area lies below 0. Thus by subtraction the area between 0 and -1.0 must be:

which we have seen before.

Similar reasoning shows that the area between 0 and +2.56 is:

Add these two areas together to get the total area between -1.0 and +2.56:

Looking at the picture we can see that the area lying above +2.56 and below -1.0 is:

Problem to work on: what proportion of the area under a standard normal curve lies between +.5 and +1.96 standard deviations? Use the methods outlined above to find the answer: draw a sketch of the normal curve. Label the points .5 and 1.96 along the z axis. Using the attached table find the areas beyond .5 and beyond 1.96. Then subtract these areas to find the amount between .5 and 1.96.

STANDARDIZED OR Z SCORES:

To use the standard normal distribution we have to change raw data to "standardized" data.

Remember the standard normal distribution has a mean of 0 and a standard deviation of 1.0, but we usually deal with different measurement scales where, for example, the mean might be 500 and the standard deviation 25. Thus, we wouldn't know from the table how much of the area is in the interval from, say, 463 to 589.

But we can find out by changing raw data to "standardized" or z scores.

To convert the data follow these steps:

Find the mean, , and standard deviation, of Y, the variable of interest.

To convert a particular raw score to a standard or z-score use the formula:

z is called a standard value, a standardized value or a z score. They are all the same. z is obtained from the original data by subtracting the mean of the original data and dividing by the standard deviation.

Examples:

Suppose we have a batch of data with a mean of 90 and a standard deviation of 10. What is the z score corresponding to a raw score of 100?

Suppose now the data are such that the mean is 1053.72 and is 105.69. What is the z score or standardized value corresponding to Y = 1950.18?

z scores may be negative as this example illustrates: the mean equals 55.7 and the standard deviation, , is 6.8. What is the z score for a raw value of 46.5?

NEXT TIME:

The standard normal distribution underlies hypothesis testing so we will explore it in more detail.

Go to Statistics main page
Go to H. T. Reynolds page.
Copyright © 1997 H. T. Reynolds