Overheads for Unit 3--Chapter 5 (Reliability)

OH 1
Relation Between Validity and Reliability

Question:

What is the difference between validity and reliability?

• Validity is the extent to which test scores mean what you say they mean. That is, are you interpreting the scores appropriately?

• Reliability is the extent to which test results are consistent over time, different versions of the test, or people scoring it. That is, how dependable are the results?

OH 2
Why should we be concerned about reliability?

• Your test can’t be valid unless it is reliable (i.e., its scores are dependable).
• In fact, a test’s criterion validity can be no higher than the square root of its reliability.
• It is important to know how much measurement error there is in individuals’ scores (e.g., on a standardized test).

OH 3
Reliability: Some important points

1. there are different kinds of consistency, so there are different kinds of reliability
2. reliability requires statistical, not logical analysis (validity requires both)
3. calculating reliability requires test scores
4. reliability can be reported in three ways, which serve different purposes
1. correlations
2. standard error of measurement
3. percentage agreement

OH 4
Reliability Coefficient (Rxx)

Rxx = square root of the following ratio:

similarity in ranks on Forms 1 & 2
(SD1)(SD2)

SD = standard deviation

Important point:

• Like all correlations, reliability coefficients are sensitive to variation in the sample (SD): smaller variation means lower reliabilities, all else equal.

• Why? Because tests can’t distinguish well among people who don’t differ much in knowledge or ability (SD is small). With retesting, small changes in their scores can easily change their ranks on the test—which depresses the numerator above (relative to the SDs).

OH 5
Assessing Reliability of Norm-Referenced Tests: Correlational Methods

Methods:

• test-retest—same test, different times
• equivalent forms—different forms of test, "same" time
• test-retest with equivalent forms—different forms, different time
• internal consistency—different parts of same test
1. split half
2. Kuder-Richardson and Coefficient Alpha

OH 6
Assessing Reliability of Norm-Referenced Tests: Correlational Methods

Important points:

Comparing methods

• some methods include more types of consistency than others
• some are better suited to some purposes than others
• test-retest with equivalent forms is the most useful for most purposes

Influences on reliability

• number of items **crucial, because it is something you can control!!**

OH 7
What kinds of consistency do each of the methods capture?

Exercise

Put an X in the appropriate spots of Table 5.4

OH 8
Assessing Reliability of Norm-Referenced Tests: Standard Error of Measurement

Definition: The amount of error (movement) in a person’s test score we can expect from one administration to another of same or comparable test.

• If short time interval: How sure can we be that the person’s true score really is close to their observed score? (fringe of error)
• If long time interval: How likely is their score to remain roughly the same over some period of time? (stability of test scores)

OH 9
Standard Error of Measurement (SEM)

Important points:

• SEM is derived directly from reliability coefficient

SEM = SD times the square root of (1-reliability)

• SEMs always depend on the spread of scores (SD) and other characteristics of a group (e.g., age)
• SEMs always refer to a specific set of test-takers, therefore
• you need to judge whether the estimates derived from another group really apply to your students (e.g., their age level, heterogeneity)

OH 10
Examples of how "fringe of error" around scores increases as reliability falls (p. 122)

Note: the following numbers are taken from page 122. They include 3 rows from that table (for SD 10, 20, and 30).

Reliability coefficient
SD .95 .90 .85 .80 .75 .70

10 2.2 3.2 3.9 4.5 5.0 5.5

20 4.5 6.3 7.7 8.9 10.0 11.0

30 6.7 9.5 11.6 13.4 15.0 16.4

OH 11
Differences in error of measurement

Do they really matter? How?

Would you expect all kinds of tests to be equally reliable? Why or why not?

OH 12
Assessing Reliability of Criterion-Referenced Tests: Percentage Agreement

Question:

Why might we not want to use correlational methods with criterion-referenced tests?

The aims of norm- and criterion-referenced tests are usually different. The former often sample a broader range of material and seek to differentiate among students. In contrast, criterion-referenced tests usually cover a smaller, more specific domain of tasks and are meant to assess absolute, not relative, levels of success in mastering the material.

OH 13
Which decisions demand high test reliability?

• Important
• Final
• Irreversible
• Unconfirmable
• Concern individuals
• Have lasting consequences

OH 14
Usability of Assessments