Overheads for Unit 3--Chapter 4 (Validity)

OH 1
Exercise: A Principal’s Question

You say that your math course teaches students to reason well mathematically. What evidence can you provide that it actually does so?


OH 2
Three Key Concepts in Judging the Quality of an Assessment


OH 3
Why should you be bothered with these concepts, anyway?


OH 4

Definition: Appropriateness of how scores are interpreted [and used]*

*Appropriate "use" of tests is a controversial recent addition to the definition of "validity." That is probably why your textbook is inconsistent in how it defines it.


OH 5

Very important points. Validity is:

  1. a matter of degree ("how valid")

  2. always specific to a particular purpose ("validity for…")

  3. a unitary concept (four kinds of evidence to make one judgment—"how valid?")

  4. must be inferred from evidence; cannot be directly measured


OH 6

Four interrelated kinds of evidence:

  1. content
  2. construct
  3. criterion
  4. consequences


OH 7
Questions Guiding Validation

  1. What are my learning objectives?

    • Did my test really address those particular objectives?

  2. Do the students' test scores really mean what I intended?

    • What may have influenced their scores?
      • growth
      • instruction
      • intelligence
      • cheating
      • etc.

  3. Did testing have the intended effects?

    • What were the consequences of the testing process and scores obtained?


OH 8
What is an achievement domain?

A carefully specified set or range of learning outcomes (content and mental skills).

In short, your set of instructional targets.


OH 9
Content-Related Evidence

Definition: The extent to which an assessment’s tasks provide a relevant and representative sample of the domain of outcomes you are intending to measure.

The evidence:

  • most useful type of validity evidence for classroom tests
  • domain is defined by learning objectives
  • items chosen with table of specifications


OH 10
Content-Related Evidence

Important points:

  • is an attempt to build validity into the test rather than assess it after the fact
  • sample can be faulty in many ways
    1. inappropriate vocabulary
    2. unclear directions
    3. omits higher order skills
    4. fails to reflect content or weight of what actually taught
  • "face validity" (superficial appearance) or label does not provide evidence of validity
  • assumes that test administration and scoring were proper


OH 11
What is a construct?

A hypothetical quality or construct (e.g., extraversion, intelligence, mathematical reasoning ability) that we use to explain some pattern of behavior (e.g., good at making new friends, learns quickly, good in all math courses).


OH 12
Construct-Related Evidence

Definition: The extent to which an assessment measures the construct (e.g., reading ability, intelligence, anxiety) the test purports to measure


OH 13
Construct-Related Evidence

Some kinds of evidence:

  • see if items behave the same (if test meant to measure a single construct)
  • analyze mental processes required
  • compare scores of known groups
  • compare scores before and after treatment (do they change in the way your theory says they will and will not?)
  • correlate scores with other constructs (do they correlate well—and poorly—in the pattern expected?)


OH 14
Construct-Related Evidence

Important points:

  • usually assessed after the fact
  • usually requires test scores
  • is a complex, extended logical process; cannot be quantified


OH 15
What is a criterion?

A valued performance or outcome (e.g., scores high on a standardized achievement test in math, later does well in an algebra class) that we believe might—or should—be related to what we are measuring (e.g., knowledge of basic mathematical concepts).


OH 16
Criterion-Related Evidence

Definition: The extent to which a test’s scores correlate with some valued performance outside the test (the criterion)

The evidence:

  • concurrent correlations (relate to a different current performance)
  • predictive correlations (predict a future performance)
Clarification: The word "criterion" is used in a second sense in testing, so don't get them confused. In this context it means some outcome that we want to predict. In the other sense, it is a performance standard against which we are comparing students' scores. In the latter sense, it is used to distinguish "criterion-referenced" interpretations of test scores from "norm-referenced" test scores. Susan reads at the "proficient" level would be a criterion-referenced interpretation. (She reads better than 65% of other students would be a norm-referenced interpretation.)


OH 17
What is a correlation?

A statistic that indicates the degree of relationship between any two sets of scores obtained from the same group of individuals (e.g., correlation between height and weight).


  • validity coefficient when used in calculating criterion-related evidence of validity
  • reliability coefficient when used in calculating reliability of test scores


OH 18
Criterion-Related Evidence

Important points:

  • always requires test scores
  • is quantified (i.e., a number)
  • must be interpreted cautiously because
  • irrelevant factors can raise or lower validity coefficients (unreliability, spread of scores, etc.)
  • often hard to find a good "criterion"
  • can be used to create "expectancy tables"


OH 19
What is a consequence?

Any effect that your assessment has—or fails to have—that is important to you or the other people involved.


OH 20
Consequences-Related Evidence

Definition: The extent to which the assessment serves its intended purpose (e.g., improves performance) and avoids negative side-effects (e.g., distorts the curriculum)

Possible types of evidence:

  • did it improve performance? motivation? independent learning?
  • did it distort the focus of instruction?
  • did it encourage or discourage creativity? exploration? higher level thinking?
  • etc.


OH 21
Consequences-Related Evidence

Important points:

  • usually gathered after assessment is given
  • scores may be interpreted correctly but the test still have negative side-effects
  • have to weigh the consequences of not using the assessment (even if it has negative side-effects). Is the alternative any better—or maybe worse?
  • judging consequences is a matter of values, not psychometrics


OH 22
Sources of Threats to Validity

  1. tests themselves
  2. teaching
  3. administration and scoring
  4. students
  5. nature of group or criterion

Can you give examples of each?