Overheads
for Unit 10—Chapter 19 (Interpreting Standardized Test Scores)
 
 
OH 1
The Challenge
 
 
Technical Challenge
 
Educational and psychological measures not like pounds or
inches
 - No
     zero point
 
 - Units
     of measurement not equal 
 
 
Methods have been developed to cope with this limitation
 - By
     providing meaningful frames of reference for interpreting scores 
 
 - By
     providing ways that give equal units of measurement
 
 - By
     providing ways to compare and add very different kinds of scores
 
 
 
Professional Standard
 
“Should be able to interpret commonly reported
scores: [such as] percentile ranks, percentile band scores, standard scores,
and grade equivalents.” (Standard 3 for Teacher Competence in Educational
Assessment)
 
 
OH 2
Methods of Interpreting Test Scores
 
Raw score
 
·       
Number of point when scored
following the scoring directions
 - Has no
     inherent meaning (neither does % correct)
 
 
Criterion-referenced and standards-based interpretations
 
 - Definition
 
 
  - student’s
      score relates to clear description of specific tasks a student can
      perform
 
  - those
      tasks, in turn, related to specified standards of mastery 
 
  - no
      need to consider other students’ scores
 
 
 - Most
     useful when test designed for this purpose
 
 
  - set
      of clearly stated learning objectives
 
  - enough
      items to infer degree of mastery or non-mastery of that domain
 
  - items
      selected to actually measure that domain
 
 
 - Guidelines
     for when can (cautiously) interpret norm-referenced tests in
     criterion-referenced terms
 
 
  - achievement domains (e.g., objectives) are
      homogeneous, delimited, and clearly specified?
 
 
·       
if not, avoid specific descriptive statements
 
  - enough items (say, 10) for each type of
      interpretation?
 
 
·       
if not, combine items into larger clusters or
make only tentative judgments
 
  - easy items were omitted to increase discrimination?
 
 
·       
if so, then scores won’t describe what low
achievers can do
 
  - used selection-type items only?
 
 
·       
if so, then scores influenced by guessing
 
  - test items provide directly relevant measure of the
      objectives?
 
 
·       
if not, base interpretations on what they do measure
 
Norm-Referenced Interpretation
 
 - Definition
 
 
  - student’s
      score relative to other students (in a norm group)
 
  - norm
      group is carefully defined
 
  - no
      need to look at level of mastery
 
 
 - Derived
     scores
 
 
  - definition:
      raw scores converted into numbers that have meaning within a particular
      comparison group
 
  - derived
      scores needed because simple rankings have limited value
 
  - most
      common types: grade equivalents, percentiles, standard scores
 
  - simple
      to calculate and conversion tables often provided
 
  - many
      types are standard scores
      (e.g., T-scores, NCE, standard age scores), based on same logic using the
      normal curve
 
  - other
      types of developmental scales besides GE (e.g., age-equivalents)
 
 
 
Expectancy Tables
(chapter 4)
 
 - Definition:
     two-way chart that shows how often students with at each score level (say,
     SAT math) perform at each level on another valued performance (say,
     freshmen grades in college)
 
 - Don’t
     need any norms
 
 
 
OH 3
Grade Equivalent Scores
 
Description
·       
Definition: the grade level at which the typical
student obtains that raw score
·       
Sample interpretation: “student had the same raw
score that was average for students in grade 5.6 in the average school”
·       
Typical score is determined for each month in a
grade: 5.0-5.9
·       
Tables provided, so just look up what grade
level corresponds to a student’s raw score
·       
Widely used, especially in elementary school
 
Widely Misinterpreted!
 
 - Don’t
     confuse GE norms with standards that all students should attain
 
 - Don’t
     interpret a GE as an estimate of the grade a student should be placed in
 
 - Don’t
     expect all students to gain 1.0 GE each year (the average). Not a
     realistic goal
 
 -  Don’t assume that the units are equal at
     different parts of the scale (the same difference can mean “just above” or
     “vastly above” average)
 
 - Don’t
     assume that scores on different tests are comparable
 
 
  - Different
      publishers test fuller ranges of students than others
 
  - Patterns
      of growth (variance in scores) may differ across subjects
 
 
 - Don’t
     interpret extreme scores as dependable estimates of student’s performance
     (usually extrapolated)
 
 
Usefulness
 
 - Most
     useful in reporting growth in basic skills in elementary school
 
 - Least
     useful for comparing performance on different tests
 
 - Inequality
     in grade units will muddle interpretation if you don’t keep it clearly in
     mind
 
 
 
OH 4
Percentile Rank
 
Description
 
·       
Definition: the percentage of students in the
norm group scoring below a particular raw score (relative position in the
group)
·       
Widely used and easily understood
 
Requirements for use
 
 - A
     conversion table (from raw scores to percentiles) based on a norm group
 
 - A
     norm group (conversion table) that is appropriate for the students taking
     the test: grade or age, time of year
 
 - A
     norm group (conversion table) that is also specific to the exact test
     being given: test, subtest, form or (difficulty) level of the test
 
 - Many
     tests or student groups means many conversion tables
 
 - Different
     purposes (comparisons of same child with different groups) require
     different norms
 
 
Limitations
 
 - Must
     always refer to a student’s percentile rank as relative to a particular
     norm group 
 
 - Usually
     require multiple sets of norms, especially in high school and beyond
 
 - Units
     not equal, especially at the extremes
 
·       
Pattern of inequality is predictable, however
·       
Same percentile difference (say, 5 points)
reflects a much bigger difference in performance at the extremes than near the average
(recall the shape of the normal curve)
 
 
OH 5
Standard Scores
 
 
Definition
 
 - Standard
     score—how far above or below average a student scored 
 
 - Distance
     is calculated in standard deviation (SD) units (a standard deviation is a
     measure of spread or variability)
 
 - The
     mean and standard deviation are for a particular norm group
 
 
Advantages
 
Based on the “normal curve,” which means that:
 
 - Scores
     are distributed symmetrically around the mean (average)
 
 - Each
     SD represents a fixed (but different) percentage of cases
 
 - Almost
     everyone is included between –3.0 and 3.0 SDs of
     the mean
 
 - The
     SD allows conversion of very different kinds of raw scores to a common
     scale that has (a) equal units and (b) can be readily interpreted in terms
     of the normal curve
 
 - When
     we can assume that scores follow a normal curve (classroom tests usually
     don’t but standardized tests do), we can translate standard scores into
     percentiles—very useful!
 
 
 
OH 6
Types of Standard Scores
 
 
All Standard Scores
 
 - Share
     a common logic
 
 - Can
     be translated into each other (see figure 19.2, p. 494)
 
 
z-Score
 
 - Simplest
 
 - The
     one on which all others based
 
 - Formula:
     z = (X-M)/SD, where X is person’s score, M is group’s average, and SD is
     group’s spread (standard deviation in scores
 
 - Z is
     negative for scores that are below average, so z’s
     are usually converted into some other system that has all positive numbers
 
 
T- Score
 
 - Normally
     distributed standard scores
 
 - M=50,
     SD=10
 
 - Can
     be obtained from z scores: T  = 50 +
     10(z)
 
 
Normalized Standard Scores
 
 - Starts
     with scores that you want to make conform to the normal curve
 
 - Get
     percentile ranks for each score
 
 - Transform
     percentiles into z scores using a conversion table (I handed one out in
     class)
 
 - Then
     transform into any other standard score you want (e.g., T-score, IQ
     equivalents)
 
 - Hope
     that your assumption was right, namely, that the scores really do
     naturally follow a normal curve. If they don’t, your interpretations (say,
     of equal units) may be somewhat mistaken 
 
 
Stanines
 
 - Very
     simple type of normalized standard score
 
 - Ranges
     from 1-9 (the “standard nines”)
 
 - Each
     stanine from 2-8 covers ½ SD
 
 - Stanine 5 = percentiles 40-59 (the middle 20 percent)
 
 - A
     difference of 2 stanines usually signals a real
     difference
 
 - Strengths
 
1.      easily
explained to students and parents
2.      normalized,
so can compare different tests
3.      can
add stanines to get a composite score
4.      easily
recorded (only one column)
1.      like
all standard scores, cannot record growth
2.      crude,
but prevents overinterpretation
 
Normal-Curve
Equivalents (NCE)
 
 - Normally
     distributed standard scores
 
 - M=50
 
 - SD=21.06
 
 - Results
     in scores that go from 1-99
 
 - Like
     percentiles, expect that have equal units (this means that they make fewer
     distinctions in the middle of the curve and more at the extremes)
 
 
Standard Age
Scores (SAS)
 
 - Normally
     distributed standard scores
 
 - Put
     into an IQ metric, where
 
 - M=100
 
 - SD=15
     (Wechsler IQ Test) or SD=16 (Stanford-Binet IQ
     Test) 
 
 
 
OH 7
Converting among Standard Scores
 
Easy Convertibility
 
 - All
     are different ways of saying the same thing
 
 - All
     represent equal units at different ranges of scores
 
 - All
     can be averaged (among themselves)
 
 - Can
     easily convert one into the other
 
 - Figure
     19.2 on p 494 shows how they line up with each other
 
 - But
     interpretable only when scores are actually
     normally distributed (standardized tests usually are)
 
 - Downside—not
     as easily understood by students and parents as are percentiles
 
 
 
OH 8
Using Standard Scores to Examine Profiles
 
Uses
 
 - You
     can compare a student’s scores on different tests and subtests when you
     convert all the scores to the same type of standard score
 
 - But
     all the tests must use the same norm group 
 
 - Plotting
     profiles can show their relative strengths and weaknesses
 
 - Should
     be plotted as confidence bands to illustrate fringe of error
 
 - Interpret
     scores as different only when their bands do not overlap
 
 - Sometimes
     plotted separately by male and female (say, on vocational interest tests),
     but is controversial practice
 
 - Tests
     sometimes come with tabular or narrative reports of profiles (see p. 496)
 
 
 
OH 9
Using Standard Scores to Examine Mastery of Skill Types
 
 
 - Some
     standardized tests try to provide some criterion-referenced information by
     providing scores on specific sets of skills (see Figure 19.4 on p. 498)
 
 - Be
     very cautious with these—use them as clues only, because each skill area
     typically has very few items
 
 
 
OH 10
Judging the Adequacy of Norms for Standard Scores
 
 
Remember Your Aim!
 
 - To
     interpret performance relative to a well-defined reference group
 
 
 
Criteria for Judging Norms
 
 - Relevant
 
 
  - Is
      this particular norm group appropriate for (a) the decision you want to
      make and (b) the set of students involved?
 
 
 - Representative
 
 
  - Was
      the norm group created with a random sample or stratified random sample?
      Does it match census figures (by race, sex, age, location, etc.) for the
      general population being considered?
 
 
 - Up-to-date
 
 
  - Don’t
      rely on the copyright date of the test manual. Read the manual to see how
      old the norms are
 
  - Beware
      of Lake Wobegon effect!
 
 
 - Comparable
 
·       
If you want to compare scores on tests with
different norm groups, check the test manuals how comparable the groups are
 - Adequately
     described—look for:
 
·       
Method of sampling
·       
Number and distribution of cases in the norming
sample
·       
Age, race, sex, geography, etc. of norm sample
·       
Extent to which standardized conditions were
maintained in testing
·       
Prefer the tests that described in more detail
 
 
OH 11
Cautions in Interpreting Standardized Test Scores
 
 
Scores should be
interpreted:
 
 - With
     clear knowledge about what the test measures. Don’t rely on titles;
     examine the content (breadth, etc.)
 
 - In
     light of other factors (aptitudes, educational experiences, cultural
     background, health, motivation, etc.) that may have affected test
     performance
 
 - According
     to the type of decision being made (high or low for what?)
 
 - As a
     band of scores rather than a specific value. Always subtract and add 1 SEM
     from the score to get a range to avoid overinterpretation
 
 - In
     light of all your evidence. Look for corroborating or conflicting evidence
 
 - Never
     rely on a single score to make a big decision