GLUTTING'S GUIDE FOR NORM-REFERENCED TEST SCORE INTERPRETATION, USING A SAMPLE PSYCHOLOGICAL REPORT
Importance of Norm-Referenced Test Scores
The purpose of this handout is to provide instruction on the interpretation of results from norm-referenced tests. When people think of a teacher's job, they seldom think of it requiring the interpretation of results from standardized tests. However, interpreting such results is actually a very important part of a teacher’s yearly (versus daily) activities.
I know this be true for two reasons. First, after working in the public schools for six years as a school psychologist, I saw how teachers reacted with puzzlement, confusion, and wonder when I presented results from norm-referenced psychological evaluations. Second, I have been teaching long enough at the University of Delaware to have had undergraduates return and take graduate-level measurement classes with me. After a few years of working in the public schools, these teachers see the impact norm-referenced tests have on children - - and, they emphasize that someone should have taught them more about the norm-referenced test-score interpretations when they were undergraduates!
There is yet another way to demonstrate the importance of norm-referenced test interpretation to classroom teachers. Approximately 15% of all children in the public school receive special education. To be eligible for special education, federal law (i.e., the Individuals with Disabilities Education Act [IDEA]) specifies that children must receive comprehensive, norm-referenced assessments from Multi-Disciplinary Teams (MDTs). Furthermore, another 5% of the children in the public schools are evaluated by MDTs, but do not qualify for special education. Therefore, around 20% of all children in the public schools are evaluated, at one time or another, by MDTs.
Given the large number of children evaluated by MDTs, the odds are approximately 1 in 5 (i.e., 20%) each year that you, a "regular" education teacher, will refer a child for evaluation. Once you refer a child, you will receive one or more reports about him or her (e.g., a psychologist’s report, an educational diagnostician’s report, etc.). Almost all of the scores in these reports are norm-referenced, and it is the results from these tests that determine whether children: (1) are eligible for special education and (2) are diagnosed as having a handicapping condition such as mental retardation (MR), a learning disability (LD), attention-deficit/hyperactivity syndrome (ADHD), conduct disorder (CD), etc. Therefore, as you can see, the norm-referenced assessments conducted by MDTs are "high stakes" and have a significant impact on the lives of children and the regular-education teachers who instruct them.
Perhaps the best way to learn about norm-referenced test interpretation is to begin with a psychological evaluation. You will see one such psychological report just below. The report is fictitious. The child, the names of his parents, teacher, school, etc. are made up. Otherwise, the report is exactly what you would receive as a classroom teacher.
Read the report carefully. There are four major areas covered in a psychological report: IQ-test results (see the "WISC-III" section), adaptive-behavior inventory results (see the "ABAS" section), achievement-test results (see the "WIAT" section), and social-emotional adjustment results (see the "ASCA" section).
Try to determine whether the child is performing above average, average, or below average in each of the four areas. You probably will be able to make the determination based on what the psychologist says in the report (i.e., the report’s text presentation). However, look at the section of the report titled "Synopsis of Formal Test Scores". It is this section of the report that provides the actual, norm-referenced scores obtained by the child. Look at the test scores themselves and see if you can determine whether the child is performing above average, average, or below average based on the scores alone. You probably will not be able to make the determination without learning more about norm-referenced tests.
Also, as surprising as it may sound to you, the actual test scores and what is said about the test scores in a report (i.e., the report’s text presentation) sometimes do not agree with one another! For this reason, as a classroom teacher, you need to know something about norm-referenced test scores. Otherwise, you will be unable to determine whether the test results accurately portray how a child in your classroom is performing academically.
Once you finish reading the psychological report, the other sections of this document will teach you how to interpret norm-referenced test scores. At times, the document will refer back to the child (Billy) discussed in the psychological report and his test scores.
NOTE: Information in this report
is fictitious. Any resemblance to real individuals is co-incidental.
CONFIDENTIAL: THIS REPORT IS TO BE SHOWN
ONLY TO PROFESSIONAL PERSONNEL WORKING WITH THE STUDENT
DATE OF BIRTH: 12/12/95 Omaha,
NE 17111
CHRONOLOGICAL
AGE: 6-11 TELEPHONE: 807-555-1212 RACE: Anglo PRIMARY
TEACHER: Mrs. Hopkins EVALUATION
DATES: 11/10/02, 11/12/02, SCHOOL:
Happy Valley Elementary
11/13/02 GRADE:
1 Evaluation Techniques: Wechsler
Intelligence Scale for Children-Third Edition (WISC-III), Wechsler Individual
Achievement Test- Second Edition (WIAT-II), Adaptive Behavior Assessment System
(ABAS), Adjustment Scales for Children and Adolescents (ASCA), Structured
Developmental History Interview with Parent, Structured Teacher Interview,
Review of School Records, Structured (Time Sampling) Classroom Observation,
Unstructured Clinical Interview with Student Full
Scale IQ: 65
Verbal Scale IQ: 67
Performance Scale IQ:
68 Information 5 Picture completion 4 Similarities 3 Coding 6 Arithmetic 5 Picture Arrangement 4 Vocabulary 5 Block Design 5 Comprehension 3 Object Assembly 4 Digit
Span 6 Symbol Search 6 WISC-III Factor Indexes
STANDARD INDEX SCORE Verbal
Comprehension 68 Perceptual
Organization 67 Freedom
from Distractibility 75
Processing
Speed 80
STANDARD COMPOSITES SCORE Reading ** Mathematics 59 Written Language ** Oral Language 62 **Not calculated prior to age
8
STANDARD SUBTESTS SCORE Word Reading 74 Pseudoword Decoding 60 Numerical Operations 63 Math Reasoning ** Spelling 72 Written Expression ** Listening Comprehension 69 Listening Comprehension 67 Oral Expression 67
**Not calculated prior to age
8 ABAS Composite and Subtest Standard
Scores
STANDARD
SCORE Composite 70
STANDARD SUBTESTS SCORE Communication 60 Community Use 70 Functional Academics 66 Home Living 80 Health and Safety 70 Leisure ** Self-Care 85 Self-Direction 85 Social 60 Work ** **Not Administered Adjustment Scales for Children and Adolescents (ASCA)
STANDARD
SCORE COMPOSITES Over-reactivity 60 Under-reactivity 52 SUBTESTS/SCALES Attention-Deficit/ Hyperactivity
(ADHD) 60 Solitary Aggressive- Impulsive
(SA-I) 50 Solitary Aggressive- Provocative
(SA-P) 52 Oppositional Defiance (OD)
53 Diffidence (DIF) 50 Avoidance (Avoid) 51 William
(Billy) was referred by his classroom teacher, Mrs. Hopkins. Billy tries hard
in school, but he is struggling in all academic areas. Billy is approaching his
seventh birthday (age = 6 years, 11 months). He lives with his both of his
biological parents, William (age 35) and Susan (age 33) Smith. William is an
accountant and Susan works as a purchasing agent. Both Mr. and Mrs. Smith are
college graduates. Neither report that they experienced learning difficulties
in school. Mr. and Mrs. Smith have lived in the same community (Omaha)
throughout their lives. A developmental history was
conducted with Mrs. Smith on 11/10/02. Two children besides Billy live in the
home: Mary, age 10 and Ann, age 8. Mary and Ann are Billy’s biological
siblings. Parent information and a review of school records reveal that both
Mary and Ann are doing well in school. Billy speaks only English, which
he has been exposed to since birth and has been speaking since he first began
talking. Mrs. Smith’s pregnancy with Billy, and her delivery, were
unremarkable. Billy was born through a Cesarean section, as were his two
siblings. However, Billy weighed less than 5 1/2 pounds at birth. His
one-minute Apgar score was moderately depressed (score = 7), but the
five-minute Apgar was in the healthy range (score = 8). Billy has never been
hospitalized, and with the exception of measles, he experienced no childhood
illnesses. He currently is taking no
prescription medications. A visual screening was
conducted by the school nurse on 10-10-02. Results revealed Billy has normal
visual acuity. Also, a hearing test was conducted in school by the speech
therapist on 10-20-02 and showed normal auditory acuity. According to his mother,
Billy reached his motor milestones (sitting alone, crawling, standing alone,
and walking) within the expected age ranges. Mrs. Smith is concerned because he
reached his language milestones later than expected (speaking first words and
speaking in short sentences). Mrs. Smith describes Billy as a happy,
cooperative child who gets along with his two, older sisters. There are many
children in Billy’s neighborhood. Mrs. Smith also indicated that Billy prefers
playing with children younger than himself than with either his sisters or
children his own age. Billy’s favorite activity is playing with trucks. His
favorite food is ice cream. Billy attended a preschool
program at age 4 and a half-day kindergarten program last year. In addition,
Billy’s first grade is in the same school (Happy Valley Elementary) as his
kindergarten class. A review of school records shows he is maintaining
good
attendance this year and he had an excellent attendance record in kindergarten.
A teacher interview was completed with Mrs. Hopkins, Billy’s current teacher.
Mrs. Hopkins reports that Billy is very well-behaved. Likewise, Billy has an
exemplary conduct record. Regarding academic performance, Mrs. Hopkins indicates
that Billy tries very hard in class. At the same time, he is struggling and
experiencing many academic difficulties. He is having problems with
introductory reading and math skills, and in both areas, he is in Mrs. Hopkins
lowest teaching groups. School records show the same pattern of academic
performance was present in kindergarten. In October, standardized
group-achievement tests were administered to all first graders at Happy Valley
Elementary School. Results disclose Billy scored far below average in Reading,
Math, and Language. Several pre-referral
interventions were attempted with Billy. For instance, Mrs. Hopkins provides
one-to-one instruction whenever possible. Billy receives one-to-one tutoring
from a community volunteer twice a week for one-half hour. Likewise, the school
has a peer tutoring program. Once a week, Billy works with a fourth-grade
student who helps him with sight-word identification. Billy was evaluated on two occasions in his school.
Physically, he presented as appropriate in height and weight for his age.
Billy’s dress was clean, and on each occasion, he was well groomed. It is
obvious that Billy is well cared for at home. His articulation was clear, and
his vision, hearing and gross-motor coordination appeared appropriate. He was
somewhat nervous about leaving the classroom to work with the examiner.
Nevertheless, Billy grew increasingly relaxed as the first test session
progressed; he was cooperative; he regularly helped the examiner put away test
materials; and he listened attentively to most test directions and questions.
Similarly, Billy was equally relaxed and cooperative during the second test
session. Wechsler Intelligence Scale
for Children – Third Edition (WISC-III) One
test administered to Billy was the WISC-III. This instrument evaluates a
variety of abilities associated with school success and it is considered to be
one of the best predictors of future achievement. The WISC-III does not assess
all abilities such as some specific mechanical aptitudes that may be important
to certain occupations and trades. Likewise, it does not measure creativity or
how well children get along with others. The
WISC-III provides a progression of scores that can be thought of as forming a
triangle. At the top is the Full Scale IQ (FSIQ). This is the best single
predictor of school achievement on the WISC-III. Underlying the FSIQ are two
scores that permit further distinctions. The first is the Verbal Scale IQ
(VIQ). It assesses the ability to think in words and apply language skills and
verbal information to solve problems. The second is the Performance IQ (PIQ)
which requires fewer verbal skills. It evaluates the ability to think in terms
of visual images and manipulate them fluently with relative speed. Another way
to think of the PIQ is that it evaluates the ability to organize
visually-presented material against a time limit. When there is a difference
between the VIQ and PIQ, the VIQ is usually the better predictor of school
achievement. Results
from the WISC-III indicate Billy may have difficulty keeping up with peers on
most tasks requiring age-appropriate thinking and reasoning. His general
cognitive ability is within the lower extreme range of intellectual functioning
(WISC-III FSIQ = 65). Billy's
ability to think with words is comparable to his ability to reason without the
use of words (VIQ = 67, PIQ = 68). Both Billy’s verbal and nonverbal
reasoning
abilities are in the lower extreme range and align with his overall ability
level. A
personal strength for Billy is his ability to process simple information
quickly and efficiently (Processing Speed Index [PSI] = 80). Billy’s PSI was
his highest result on the WISC-III. The PSI converts to performance at the
ninth percentile. In other words, Billy is able to process
simple information more
quickly than 9 out of every 100 children his age. Adaptive Behavior Assessment System (ABAS) Billy’s adaptive functioning skills were
assessed to determine his level of social and daily living skills. His mother
completed the ABAS during the interview. The ABAS assesses an individual’s
personal and community independence, as well as aspects of personal development
outside the school setting across 10 areas. These areas include communication
skills, self-direction, social interaction skills, health and safety awareness,
etc. The 10 skills form a composite and are collectively referred to as
“adaptive behavior”. Results of the ABAS suggest that Billy has
limitations in several adaptive skills. Results of this assessment indicate
that his skills for personal care including eating, dressing, and bathing are a
personal strength (Self-Care standard score = 85). Another personal strength is
Billy's skills for independence and responsibility, such as starting and
completing tasks, following time limits and directions, and making choices
(Self-Direction standard score = 85). However, when compared to same age peers
without disability, Billy's speech, language, and listening skills
(Communication standard score = 60) and his skills needed for social
interaction (Social = 60) appear to be somewhat limited. Billy's mother noted
that his vocabulary is restricted compared to other children his age and that
he has fewer friends. Likewise, Billy's overall level of adaptive behavior was
in the lower extreme range (Composite standard score = 70). The latter three
scores, and most of Billy's adaptive behavior results, are commensurate with
his overall cognitive functioning, as measured by the WISC-III FSIQ. Wechsler Individual Achievement Test – Second Edition
(WIAT-II) Billy completed the WIAT-II, which is an
individually administered achievement test. The WIAT provides information about
children's reading, mathematics, and language performance. Additionally,
Billy's teacher provided detailed information on his current academic
performance. Billy's highest level of achievement
functioning took place in pre-reading skills. His Word Reading results were at
the 4th percentile. He demonstrated evenly developed pre-reading skills and
identified some beginning and ending sounds for a few common words (e.g., hat),
but had trouble with others (e.g., fish, dish,
star). He did not read any common words and had difficulty using phonetic
knowledge to sound out nonsense or unfamiliar words (Pseudoword Reading =
below 1st
percentile). Similar to his
reading results, Billy's language skills were in the lower extreme range (Oral
Language Composite = 1st percentile). In formal testing, he correctly
identified pictures of many common objects when presented singly. But, when
asked to describe scenes which contained many objects, Billy had difficulty
naming more than one or two. His teacher reported that Billy can identify all
primary colors, but has difficulty identifying common geometric shapes. The
teacher also indicated that Billy occasionally has trouble following
orally-presented directions, but that he has improved significantly since the
beginning of the year. Billy's lowest level of achievement functioning appears to be in the
mathematics area, where his performance was measured in the lower extreme range
(Mathematics Composite = less than 1st percentile). Billy wrote all of the
single digit numbers presented; he counted objects up to 10; and he compared
shapes according to size. However, he was unable to add 1+2. Billy did not
correctly tell time, measure with a ruler, or do subtraction. His teacher
reported that Billy identifies numbers and counts up to 10 consistently, but
his performance becomes less consistent with higher numbers. She also noted
that Billy performs simple addition using his fingers with adult help, but
cannot do so for subtraction. Billy's skills in reading, language, and
math were measured commensurate with his estimated cognitive ability. Social and Emotional Functioning Billy’s behavior,
as rated by his teacher, reflects adequate functioning in most areas. His
behavior at school, as measured by the Adjustment Scales for Children and
Adolescents (ASCA), was estimated to be predominantly in the adjusted range.
However, his teacher rated Billy in the Borderline range for difficulty
sustaining attention (ADHD scale standard score = 60). The teacher qualified
her observation by noting that this difficulty was present when Billy was in
situations where the teacher was addressing the whole class. The teacher
reported that Billy is consistently cooperative and has made great strides in
social interaction. She noted that Billy is very capable of working
independently on academic tasks. During direct observation in the classroom,
Billy was measured to be on-task approximately 90% of the time; he occasionally
stared off into space and was distracted when another peer whispered to a girl
seated next to him. Similarly, his
mother characterizes Billy as very well-behaved and affectionate. He gets along
well with his sisters and cousins, and his mother noted that Billy speaks at
great length about all the fun he has at school. Billy is approaching his 7th birthday. He
is attending first grade. Billy was referred for an educational evaluation by
his current teacher due to his minimal progress in attaining basic skills,
including oral language and pre-reading skills. The
present evaluation suggests that Billy functions in the lower extreme range of
general intellectual ability. Adaptively, delays commensurate with his measured
cognitive ability were noted in several areas, including communication and
social interaction. Academically, the results of formal testing indicate that
Billy is performing at levels that would be expected, given his measured
cognitive ability. Behavior-assessment results suggest generally appropriate
levels of classroom adjustment. Evaluated by, Joseph J.
Glutting, Ph.D. School
Psychologist Happy Valley
Public Schools PSYCHOLOGICAL
EVALUATION
NAME:
William (Billy) Smith PARENTS: William and Susan Smith
GENDER:
Male ADDRESS: 411
Hanson Driver
SYNOPSIS OF FORMAL TEST SCORES
WISC-III IQs and Subtest Standard
Scores
WIAT-II Composites and Subtest
Standard Scores
Reason for Referral:
History:
Current Observations:
Summary:
OVERVIEW OF NORM-REFERENCED TEST SCORE INTERPRETATION
The direct numerical report of a child’s test performance is the child's raw score (e.g., the number of right-wrong answers). Most often, we cannot interpret raw test scores as we do physical measures such as height because raw scores in a psychological report have no true meaning. Likewise, raw scores are NOT measured in equal units along a line. Therefore, the way one can meaningfully talk about test scores is to bring in a referent. There are two major referents for tests: norm-referencing and criterion-referencing. We already discussed both types of referents earlier in the course. Now, as a result of the psychological report for Kelly, we will pay particular attention to instruments that facilitate norm-referenced comparisons.
NORM VS. CRITERION-REFERENCED MEASUREMENT
The basic difference between norm- and criterion-referenced tests is their interpretation; that is; how we derive the meaning from a score. Norm-referenced tests are constructed to provide information about the relative status of children. Thus, they facilitate comparisons between a child's score to the score distribution (i.e., mean and standard deviation of some norm group. As a result, the meaningfulness of these scores depends on:
(1) the extent to which the test user (e.g., psychologist, teacher, parents) is interested in comparing a child to the mean and standard deviation of a norm group.
(2) the adequacy of the norm group.
ADEQUACY OF THE NORM GROUP IN NORM-REFERENCED TESTING
Before we learn how to interpret Billy’s test scores, we need to learn why the norm group in norm-referenced test interpretations is so important.
The American Psychological Association (APA), the American Educational Research Association (AERA), and the National Council for Measurement in Education (NCME) (1985) clearly state that it is the test publisher's responsibility to develop suitable norms for the groups on whom the test is to be used. There are four major types of norms. Which of the four types of norms are used by a psychologist (or a school district when conducting group testing) can have a radical impact on the interpretation of a child’s test results.
1. National norms. This is the most common norm applied to test scores. Therefore, it is the most important test norm. These norms are almost always reported separately by the different age or grade levels. Most group instruments reporting national norms employ reasonably satisfactory norm groups. On the other hand, most individually-administered, clinical instruments used by psychologists, educational diagnosticians, etc., have inadequate national norms. Many have samples that are too small in size, are conducted on regional samples not representative of the country, are insufficiently stratified by age, and disproportionately represented by Anglos and middle-class children.
The WISC-III, WIAT, ABAS, and ASCA instruments used to evaluate Billy all have very good national norms. The diagnostic decisions made my MDTs should ALWAYS be based on tests that use national norms, and NOT on any of the other norms discussed below.
2. State (also called Regional) Norms. Here, the referent changes from children across the United States to those within a particular state. State norms are confusing. State norms can sometimes be helpful, however. For instance, if we wanted to compare a child's achievement level to the achievement level of other children within the state of Delaware, we would use a state norm.
Generally, state norms impose problems for interpretation. Let’s talk about an instance where state norms would not be appropriate. In the psychological report for Billy, his overall IQ on the WISC-III was 67. The overall IQ on the WISC-III is referred to as the Full Scale IQ (FSIQ). Billy’s FSIQ of 67 was determined by comparing his performance to other children across the nation (i.e., national norms were used). We would not want to compare him to just children in the state of Delaware (i.e., we would not want to use state norms) because children in Delaware could have higher, or lower, IQs than children in other states. In other words, when we think about children’s intelligence levels, achievement levels, etc., we typically think about how they compare to other children across the nation - - and not how they compare to children just within one state. If MDTs made decisions on the basis of state norms, a child could be identified as mentally retarded based on their "Delaware" IQ, move across state lines and maybe not be retarded in another state. Consequently, as noted above, national norms are to be preferred in the norm-referenced, diagnostic assessments completed by MDTs.
3. Special-Group Norms. For SOME decision-making purposes, special-group norms make sense. For example, when hiring an engineer from a homogenous pool of applicants who are all engineers applying for a job, a better decision can be made on norms based on a pool of engineers alone because we get to see how each applicant compares to the "typical" engineer. Norms based on the general population would probably fail to make the fine-grain distinctions among the engineering applicants that are necessary to make the hiring decision because engineers are brighter and more educated than the average person.
You may not know this, but the SAT uses special-group norms. Ask yourself: "Who takes the SAT"? It is only those people who are pretty successful in high school. This is a "special" group because you actually have to be pretty smart to graduate high school - - and, only those people who do well in high school consider going to college. It is only this latter group of people who take the SAT - - and norms for the SAT are based on this group. In other words, the norm group for SAT is a "special" norm group because it represents the top one-half of all students in the United States. Consequently, if you did not score very highly on the SAT, it is not an embarrassment. The reason is because you scored below average in comparison to a special norm group that was above average to begin with!
Another way of saying all of the above is that you could take the SAT (which has above-average, special group norms) and score below average. You could then take the adult form of the WISC-III (i.e., the WAIS-III) and still score above average on the WAIS-III because the WAIS-III has national norms!
On the other hand, special group norms are inappropriate for the educational or diagnostic decisions made by MDTs. For example, you probably would agree that it would be incorrect to compare Billy’s adaptive-behavior results on the ABAS by using norms only on children with retardation. The reason is because if Billy were to score in the average range for this special group, he would still share more in common with retarded children than with "regular" education students simple because he is "average" only in comparison to children who are retarded.
4. Local Norms. Many educators prefer some intradistrict norm where they can compare children to one another within their school district. These norms are referred to as "local" norms. The idea behind local norms is that test users can compare specific children to the average in that particular locale. While the use of local norms has some intuitive appeal, the procedure can be misleading when the local test mean deviates sharply from the test's national mean. For example, if the performance in a specific school district is below the national mean, the relative performance of children will be inflated by using local norms.
STANDARD SCORES
As we already know from our earlier lesson on statistics, the basic standard score is the z-score. We also know that once we obtain a z-score, it is a simple process to convert a z-score to a t-score, IQ score, and such.
The basic standard score, the z-score, is defined as follows:
Z = (X - M)/SD
where:
X = a child's raw-score on the test,
The mean for a full set of z-scores is set at zero and the standard deviation is set at 1.0. Stated simply, z-scores are raw-scores expressed in standard deviation units from the mean. Further, we know that a major advantage of standard scores is that they are measured in equal units.
Problem 1
Before we go on to t-scores and other types of standard scores, let’s try a couple of problems where we convert raw scores on a test into z-scores. Assume that a test has a raw-score mean of 62 and a standard deviation of 9. If a child obtains a raw score of 71 on the test, what would her z-score be? Calculate this problem yourself.
Problem 2
Let’s return to the test used in problem 1 just above. The test has a raw-score mean of 62 and a standard deviation of 9. A second child takes the test and gets a raw-score of 53. What is this child’s z-score? Calculate this problem yourself.
I am going to give you a lot of help with problem 2 just above. The correct answer is z = -1.0. The answer shows that z-scores below the mean have negative values. In order to get enough precision when using z-scores, we must use at least one decimal place. This makes z-scores such as -1.0 awkward. Another drawback is that approximately half of all z-scores are negative.
Let’s consider again the case of Billy. He obtained a WISC-III FSIQ of 67. This number may not mean much to you yet, but it is a pretty low IQ. It is possible to convert his IQ to a z-score. When you do so, a WISC-III FSIQ of 67 converts to a z-score of -2.20. How would you like to tell Billy’s parents that his IQ was negative! I know I wouldn’t - - and its for this reason that tests use metrics.
Another way of saying all of the above is that we can avoid negative scores and decimals by simply using a standard score with a mean sufficiently greater than 0 to avoid minus score values, and a standard deviation sufficiently greater than 1 to make decimals unnecessary.
We already learned the general formula for converting z-scores to other standard scores.
Desired Metric unit = z (SD) + M
For example, Wechsler's intelligence tests (WISC-III, WAIS-III) use this form:
IQ = z (15) + 100
The SAT and GRE use this form:
SAT = z (100) + 500
Many behavior rating scales use t-scores. You can convert z-scores to this form as follows:
t-score = z (10) + 50
Problem 1
Johnny obtains a z-score of -2.0. What numerical value would his score be if we converted it to Wechsler IQ units, SAT units, and T-score units?
IQ = -2.0 (15) + 100
IQ = -30 + 100
IQ = 70
SAT = -2.0 (100) + 500
SAT = -200 + 500
SAT = 300
T score = -2.0 (10) + 50
T = -20 + 50
T = 30
As can be seen from this, IQs, SATs, and T-scores have all the properties of z-scores without the awkwardness resulting from negative scores and decimal points.
Problem 3
Mary obtains an IQ of 115 on the WAIS-III, and her SAT score is 650. On which test did she do better?
To find this out, we need to convert both scores to a common unit, the z-score. All we have to do is use the formula for a z-score.
To find the z-score, given a WAIS-III IQ of 115, we:
Z = (115 - 100)/15
Z = 15/15
Z = 1.0
To find the z-score, given a SAT score of 650, we:
Z = (650 - 500)/100
Z = 150/100
Z = 1.5
So, then, on which test did Mary do better?
We already discussed other common, standard-score metrics during our lesson on statistics. However, they are so important that I will present them again:
__________________________________________________________________
T score (e.g., the ASCA, CDCL, and BRP-2)
Wechsler IQ units (e.g., the WISC-III, WIAT, WRAT-III, KM-R, and ABAS)
Wechsler subtest units (e.g., the Information, Similarities, and other subscales of the WISC-III)
Stanford Binet IQ units
GRE
SAT
NCEs
Stanines
__________________________________________________________________
RELATIVE-STATUS SCORES
We need to discuss some other types of derived scores (i.e., converted from raw scores). There are several types of derived scores that give a child’s relative status. Like standard scores, these other relative-status scores are derived from raw scores. However, these other relative-status scores are not standard scores.
Remember, standard scores present everything in equal units. This means we can add, subtract, multiply, and divide standard scores. We cannot add, subtract, multiply, and divide the other types of relative-status scores.
Besides standard scores, three other types of relative-status scores are commonly used by MDTs: (a) percentiles, (b) grade equivalents, and (c) age equivalents. We will now discuss each type of relative-status score.
Percentiles
Percentiles. A percentile is the point in a score distribution BELOW which a certain percentage of the people fall. Thus, if a person obtains a percentile score of 50, it means that 50 percent of the population falls below this person. Likewise, if a person gets a percentile score of 75, it means that 75 percent of the population falls below this person.
Percentiles are not standard scores. The reason is because percentiles are expressed in ordinal units (ranks). All that the term "ordinal" means is that the distance between units (i.e., percentile numbers) is not equal. In other words, the distance between the 49th and 50th percentiles is much smaller than the distance between the 1st and 2nd percentiles. The reason is because the 49th and 50th percentiles are near the middle of the bell-shaped curve and the 1st and 2nd percentiles are at one "tail" of the bell-shaped curve. As strange as it may seem (and I will show you this in class), the distance between the 1st and 3rd percentiles is exactly the same distance as that between the 16th and 50th percentiles!
Although widely used, percentiles suffer from two serious limitations. One limitation is that the size of percentile units is not constant in terms of standard-score units. We just covered this limitation above, but I will repeat it again to be thorough. For example, if the distribution of test scores is a normal, bell-shaped curve, the distance between the 90th and 99th percentiles is much greater than the distance between the 50th and 59th percentiles. One standard-score unit change near the mean of a test may alter a percentile score by many units while a single standard-score unit change at the tail of the distribution may not change the percentile score at all!
A second limitation of percentiles is that gains and losses cannot be compared meaningfully because percentiles are not measured in equal units. Thus, because the units are not equal, you cannot add, subtract, multiple, or divide percentiles.
Percentile scores can be very deceiving!!! Let’s consider the psychological report for a second student, Kelly. Her standard score in mathematics on the WIAT was 86. This score converts to a percentile score of 17. A standard score of 86 is in the Average range of achievement. However, most teachers would say that a child whose mathematics score is at the 17th percentile is having big trouble academically. This simply is not the case! Yes, like her classroom teacher, the psychologist would prefer to see Kelly have a much higher achievement level. However, a score at the 17th percentile is not all that low. Psychologists know this fact. It is not until you are about the 5th percentile, or lower, that the score suggests a need for special education. Because percentiles are misinterpreted so often, I tell graduate students that, in general, it is best not to present them in their psychological reports. (Note: the psychological report did present percentiles for the case of Billy because I wanted to show you the problems they can pose.)
Standard scores are clearly better to interpret than percentiles. Furthermore, once you know a child’s standard score on a test it is relatively easy to translate the standard score to a percentile. You can use standard score-to-percentile conversion tables to do this without having to make any calculations of the sort described earlier. In class, I will show you how to use such tables. Here are four:
Age- and Grade-Equivalents
Like percentiles, age- and grade-equivalents are two other types of derived scores. However, percentiles, age-equivalents, and grade-equivalents are not standard scores.
We already know that percentile scores can cause problems for interpretation. The truth of the matter is that age- and grade-equivalents are far worse to interpret than percentiles!
Age equivalents are intended to convey the meaning of test performance in terms of the typical child at a given age. Likewise, grade equivalents attempt to provide information in terms of the typical child at a given grade level.
Grade equivalents are the most common method for reporting results on standardized achievement tests prior to high school (Echternach, 1977). Although grade equivalents are very popular, they also are very problematic. Approximately 20 years ago, the APA, AERA, and NCME proposed that they be banned. Unfortunately, this never occurred.
Age- and grade-equivalents are essentially the same thing, except that age-equivalents compare children to other children who are at the same age level, whereas grade-equivalents compare children to others at their grade level. Therefore, because grade-equivalents are more popular than age-equivalents, the rest of the document will discuss grade equivalents.
Grade-equivalents can be explained best by an example. If a student obtains a raw score on a test that is equal to the median score for all the beginning sixth-graders (September testing) in the norm group, then that student is given a grade-equivalent of 6.0. A student who obtains a score equal to the median score of all beginning fifth-graders is given a grade equivalent of 5.0. If a student should score between these two points, "interpolation" would be used to determine the grade equivalent. Because most schools run for 10 months, successive months are expressed as decimals. Thus, 5.1 would refer to the average performance of fifth graders in October, 5.2 in November, and so on to 5.9 in June.
Limitations of Grade Equivalents
Grade-equivalents have a great deal of intuitive appeal because parents, teachers, as well as many psychologists, think the numbers actually mean something. However, this is not the case. By way of example, most parents, teachers, (and many psychologists) would assume that a fifth grade child who obtain a grade equivalent of 3.2 knows the same amount of reading as a third grade child who obtains a grade equivalent of 3.2. This simply is not true! The fifth grade child actually knows more reading! Thus, this short example shows some of the problems associated with grade equivalents.
If you do not believe what I just said about grade equivalents (or even if you do), it would be worthwhile to read the question and answer section of "Hills Handy Hints" devoted to grade equivalents.
We are now going to discuss the limitations of grade equivalents, but the problems cited for grade equivalents also apply to age equivalents.
Grade equivalents suffer from at least 7 major limitations. I will now present each of these limitations
1. Grade equivalents for low-scores in the low grades and high-scores in the high grades are impossible to establish because they generally are extrapolated from existing observations. This procedure, at best, represents little more than an educated guess.
2. Even in grades where norms exist, it is appropriate for 50% of the children in a classroom to score below their grade level. That is, grade equivalents give us little information about the percentile standing of a person within a class. For example, during a September testing, it is normal for 50% of the children to obtain grade equivalents below their grade placement. This is especially true in the upper grades.
3. Related to problem number 2, grade equivalents tend to exaggerate the significance of small differences, and in this way, tend to encourage the improper use of test scores. Because of the large within-grade variability it is possible, for example, for a child who is only moderately below the median for his grade to appear on a grade equivalent as much as a year or two below expectancies. (This phenomena is most likely to occur in the upper grades - - say in the sixth grade and above.) A comparison of the child's grade equivalent with his percentile rank will make this fact clear. The problem is most evident, say, when a 6th grader obtains a grade equivalent of grade 1.3. The 1.3 grade equivalent does not mean that the child is functioning on the same level as a child in the third month of first grade. The older child most probably knows more.
4. Grade equivalents are not comparable across subject matter. A 6th-grade student, for example, because of the differences in grade equivalents for various subject matter, can have a grade equivalent of 6.6 in reading and 6.2 in mathematics and yet have a higher standard score (or percentile score) in mathematics! In other words, grade equivalents are an artifact of the particular way the subject-matter area in question is measured on the test AND the way the subject-matter is introduced in the curriculum of a particular school district.
5. Grade equivalents assume that growth across years is uniform. The assumption of uniform growth across years is untenable. Developmental psychologists teach us that rate of growth is greater for younger children and that it diminishes as children advance in age. Grade equivalents, however, act as though 1 month of growth in the first grade is the same as 1 month of growth in the 10th grade.
6. Grade equivalents are based on 9 or 10-month school-year metrics. This means grade equivalents assume that either no growth takes place during the summer or that growth during summer is equal to one month of growth during the school year. There is certainly reason to doubt that these assumptions are true.
7. Finally, grade equivalents have no interpretive value beyond the eighth or ninth grade. They are appropriate only for those subjects that are common to a particular grade level.
Grade equivalents remain popular in spite of their inadequacies. Educators are under the impression that such scores are easily and correctly interpreted - an unfortunate assumption. At a minimum, it is appropriate to suggest that grade equivalents never be used alone without some other type of score, such as standard scores or percentile ranks - - and, it may not be too dogmatic to suggest that we stop using these scores altogether.
You probably noticed that the psychological report for Billy presented neither age- or grade-equivalents. The reason, quite simply, is because their metrics are so bad.