Overheads for Unit 3--Chapter 4 (Validity)

Overheads for Unit 3--Chapter 4 (Validity)

OH 1
Exercise: A Principal’s Question

You say that your math course teaches students to reason well mathematically. What evidence can you provide that it actually does so?

OH 2
Three Key Concepts in Judging the Quality of an Assessment

Validity
Reliability
Usability

OH 3
Why should you be bothered with these concepts, anyway?

Appreciate why all assessments contain error
Know the various sources of error
Understand that different kinds of assessments are prone to different kinds of error
Build assessments with less error
Know how to measure error, if need be
Know what is safe—and not safe—to conclude from assessment results
Decide when certain assessments should not be used

OH 4
Validity

Definition: Appropriateness of how scores are interpreted [and used]*

That is, to what extent does your assessment measure what you say it does [and is as useful as you claim]?
Stated another way: To what extent are the interpretations and uses of a test justified by evidence about its meaning and consequences.

*Appropriate "use" of tests is a controversial recent addition to the definition of "validity." That is probably why your textbook is inconsistent in how it defines it.

OH 5
Validity

Very important points. Validity is:

a matter of degree ("how valid")
always specific to a particular purpose ("validity for…")
a unitary concept (four kinds of evidence to make one judgment—"how valid?")
must be inferred from evidence; cannot be directly measured

OH 6
Validity

Four interrelated kinds of evidence:

content
construct
criterion
consequences

OH 7
Questions Guiding Validation

What are my learning objectives?
- Did my test really address those particular objectives?
Do the students' test scores really mean what I intended?

What may have influenced their scores?

growth

instruction
intelligence
cheating

etc.
Did testing have the intended effects?
What were the consequences of the testing process and scores obtained?

OH 8
What is an achievement domain?

A carefully specified set or range of learning outcomes (content and mental skills).

In short, your set of instructional targets.

OH 9
Content-Related Evidence

Definition: The extent to which an assessment’s tasks provide a relevant and representative sample of the domain of outcomes you are intending to measure.

The evidence:

most useful type of validity evidence for classroom tests

domain is defined by learning objectives

items chosen with table of specifications

OH 10
Content-Related Evidence

Important points:

is an attempt to build validity into the test rather than assess it after the fact

sample can be faulty in many ways

inappropriate vocabulary

unclear directions

omits higher order skills

fails to reflect content or weight of what actually taught

"face validity" (superficial appearance) or label does not provide evidence of validity

assumes that test administration and scoring were proper

OH 11
What is a construct?

A hypothetical quality or construct (e.g., extraversion, intelligence, mathematical reasoning ability) that we use to explain some pattern of behavior (e.g., good at making new friends, learns quickly, good in all math courses).

OH 12
Construct-Related Evidence

Definition: The extent to which an assessment measures the construct (e.g., reading ability, intelligence, anxiety) the test purports to measure

OH 13
Construct-Related Evidence

Some kinds of evidence:

see if items behave the same (if test meant to measure a single construct)

analyze mental processes required

compare scores of known groups

compare scores before and after treatment (do they change in the way your theory says they will and will not?)

correlate scores with other constructs (do they correlate well—and poorly—in the pattern expected?)

OH 14
Construct-Related Evidence

Important points:

usually assessed after the fact

usually requires test scores

is a complex, extended logical process; cannot be quantified

OH 15
What is a criterion?

A valued performance or outcome (e.g., scores high on a standardized achievement test in math, later does well in an algebra class) that we believe might—or should—be related to what we are measuring (e.g., knowledge of basic mathematical concepts).

OH 16
Criterion-Related Evidence

Definition: The extent to which a test’s scores correlate with some valued performance outside the test (the criterion)

The evidence:

concurrent correlations (relate to a different current performance)

predictive correlations (predict a future performance)
Clarification: The word "criterion" is used in a second sense in testing, so don't get them confused. In this context it means some outcome that we want to predict. In the other sense, it is a performance standard against which we are comparing students' scores. In the latter sense, it is used to distinguish "criterion-referenced" interpretations of test scores from "norm-referenced" test scores. Susan reads at the "proficient" level would be a criterion-referenced interpretation. (She reads better than 65% of other students would be a norm-referenced interpretation.)

OH 17
What is a correlation?

A statistic that indicates the degree of relationship between any two sets of scores obtained from the same group of individuals (e.g., correlation between height and weight).

Called:

validity coefficient when used in calculating criterion-related evidence of validity

reliability coefficient when used in calculating reliability of test scores

OH 18
Criterion-Related Evidence

Important points:

always requires test scores

is quantified (i.e., a number)

must be interpreted cautiously because

irrelevant factors can raise or lower validity coefficients (unreliability, spread of scores, etc.)

often hard to find a good "criterion"

can be used to create "expectancy tables"

OH 19
What is a consequence?

Any effect that your assessment has—or fails to have—that is important to you or the other people involved.

OH 20
Consequences-Related Evidence

Definition: The extent to which the assessment serves its intended purpose (e.g., improves performance) and avoids negative side-effects (e.g., distorts the curriculum)

Possible types of evidence:

did it improve performance? motivation? independent learning?

did it distort the focus of instruction?

did it encourage or discourage creativity? exploration? higher level thinking?

etc.

OH 21
Consequences-Related Evidence

Important points:

usually gathered after assessment is given

scores may be interpreted correctly but the test still have negative side-effects

have to weigh the consequences of not using the assessment (even if it has negative side-effects). Is the alternative any better—or maybe worse?

judging consequences is a matter of values, not psychometrics

OH 22
Sources of Threats to Validity

tests themselves

teaching

administration and scoring

students

nature of group or criterion

Can you give examples of each?