(Adapted from a handout by Joseph J. Glutting, Ph.D.)
November 5, 2002


Importance of Norm-Referenced Test Scores


The purpose of this handout is to provide instruction on the interpretation of results from norm-referenced tests. When people think of a teacher's job, they seldom think of it requiring the interpretation of results from standardized tests. However, interpreting such results is actually a very important part of a teacher’s yearly (versus daily) activities.

I know this be true for two reasons. First, after working in the public schools for six years as a school psychologist, I saw how teachers reacted with puzzlement, confusion, and wonder when I presented results from norm-referenced psychological evaluations. Second, I have been teaching long enough at the University of Delaware to have had undergraduates return and take graduate-level measurement classes with me. After a few years of working in the public schools, these teachers see the impact norm-referenced tests have on children - - and, they emphasize that someone should have taught them more about the norm-referenced test-score interpretations when they were undergraduates!

There is yet another way to demonstrate the importance of norm-referenced test interpretation to classroom teachers. Approximately 15% of all children in the public school receive special education. To be eligible for special education, federal law (i.e., the Individuals with Disabilities Education Act [IDEA]) specifies that children must receive comprehensive, norm-referenced assessments from Multi-Disciplinary Teams (MDTs). Furthermore, another 5% of the children in the public schools are evaluated by MDTs, but do not qualify for special education. Therefore, around 20% of all children in the public schools are evaluated, at one time or another, by MDTs.

Given the large number of children evaluated by MDTs, the odds are approximately 1 in 5 (i.e., 20%) each year that you, a "regular" education teacher, will refer a child for evaluation. Once you refer a child, you will receive one or more reports about him or her (e.g., a psychologist’s report, an educational diagnostician’s report, etc.). Almost all of the scores in these reports are norm-referenced, and it is the results from these tests that determine whether children: (1) are eligible for special education and (2) are diagnosed as having a handicapping condition such as mental retardation (MR), a learning disability (LD), attention-deficit/hyperactivity syndrome (ADHD), conduct disorder (CD), etc. Therefore, as you can see, the norm-referenced assessments conducted by MDTs are "high stakes" and have a significant impact on the lives of children and the regular-education teachers who instruct them.

Perhaps the best way to learn about norm-referenced test interpretation is to begin with a psychological evaluation. You will see one such psychological report just below. The report is fictitious. The child, the names of his parents, teacher, school, etc. are made up. Otherwise, the report is exactly what you would receive as a classroom teacher.

Read the report carefully. There are four major areas covered in a psychological report: IQ-test results (see the "WISC-III" section), adaptive-behavior inventory results (see the "ABAS" section), achievement-test results (see the "WIAT" section), and social-emotional adjustment results (see the "ASCA" section).

Try to determine whether the child is performing above average, average, or below average in each of the four areas. You probably will be able to make the determination based on what the psychologist says in the report (i.e., the report’s text presentation). However, look at the section of the report titled "Synopsis of Formal Test Scores". It is this section of the report that provides the actual, norm-referenced scores obtained by the child. Look at the test scores themselves and see if you can determine whether the child is performing above average, average, or below average based on the scores alone. You probably will not be able to make the determination without learning more about norm-referenced tests.

Also, as surprising as it may sound to you, the actual test scores and what is said about the test scores in a report (i.e., the report’s text presentation) sometimes do not agree with one another! For this reason, as a classroom teacher, you need to know something about norm-referenced test scores. Otherwise, you will be unable to determine whether the test results accurately portray how a child in your classroom is performing academically.

Once you finish reading the psychological report, the other sections of this document will teach you how to interpret norm-referenced test scores. At times, the document will refer back to the child (Billy) discussed in the psychological report and his test scores.




NOTE: Information in this report is fictitious

NOTE: Information in this report is fictitious. Any resemblance to real individuals is co-incidental.






NAME:  William (Billy) Smith                            PARENTS:    William and Susan Smith
GENDER:  Male                                                    ADDRESS:     411 Hanson Driver

DATE OF BIRTH:  12/12/95                                                          Omaha, NE 17111

CHRONOLOGICAL AGE:  6-11                             TELEPHONE:  807-555-1212

RACE:  Anglo                                                              PRIMARY TEACHER: Mrs. Hopkins

EVALUATION DATES: 11/10/02, 11/12/02,            SCHOOL: Happy Valley Elementary

      11/13/02                                    GRADE: 1


Evaluation Techniques:


Wechsler Intelligence Scale for Children-Third Edition (WISC-III), Wechsler Individual Achievement Test- Second Edition (WIAT-II), Adaptive Behavior Assessment System (ABAS), Adjustment Scales for Children and Adolescents (ASCA), Structured Developmental History Interview with Parent, Structured Teacher Interview, Review of School Records, Structured (Time Sampling) Classroom Observation, Unstructured Clinical Interview with Student



Note to Educ451 students: To aid interpretation, please know that the WISC-III (IQs, factor indexes, and composites), WIAT-II (all scores), ABAS (all scores) are in the IQ metric, which means average=100 and SD=15. The ASCA is in T-scores, which means average=50 and SD=10 (the highest scores are the worst on this test.) The WISC-III subtest scores are average=10 and SD=3.




WISC-III IQs and Subtest Standard Scores


Full Scale IQ:  65                           Verbal Scale IQ:  67                Performance Scale IQ:  68


Information                   5                                              Picture completion                    4

Similarities                    3                                              Coding                                     6

Arithmetic                     5                                              Picture Arrangement                 4

Vocabulary                   5                                              Block Design                          5

Comprehension            3                                              Object Assembly                     4

Digit Span                    6                                              Symbol Search              6




WISC-III Factor Indexes


INDEX                                           SCORE

Verbal Comprehension                        68       

Perceptual Organization                 67       

Freedom from Distractibility                 75       

Processing Speed                          80




WIAT-II Composites and Subtest Standard Scores




COMPOSITES                             SCORE

Reading                                                **

Mathematics                                         59

Written Language                                  **

Oral Language                                      62


**Not calculated prior to age 8




SUBTESTS                                    SCORE

Word Reading                                      74

Pseudoword Decoding                      60       

Numerical Operations                               63

Math Reasoning                                 **       

Spelling                                                72

Written Expression                                **

Listening Comprehension                        69

Listening Comprehension                        67       

Oral Expression                                67       

**Not calculated prior to age 8








ABAS Composite and Subtest Standard Scores




Composite                                            70



SUBTESTS                                    SCORE


Communication                         60       

Community Use                              70

Functional Academics                               66

Home Living                                         80

Health and Safety                           70

Leisure                                                 **

Self-Care                                             85

Self-Direction                                       85                               

Social                                                   60

Work                                                   **


**Not Administered



Adjustment Scales for Children and Adolescents (ASCA)







Over-reactivity                        60

Under-reactivity                                 52





            Hyperactivity (ADHD)          60                               

Solitary Aggressive-

            Impulsive (SA-I)                    50                               

Solitary Aggressive-

            Provocative (SA-P)                52

Oppositional Defiance (OD)            53

Diffidence (DIF)                          50

Avoidance (Avoid)                                   51





Reason for Referral:


William (Billy) was referred by his classroom teacher, Mrs. Hopkins. Billy tries hard in school, but he is struggling in all academic areas.




Billy is approaching his seventh birthday (age = 6 years, 11 months). He lives with his both of his biological parents, William (age 35) and Susan (age 33) Smith. William is an accountant and Susan works as a purchasing agent. Both Mr. and Mrs. Smith are college graduates. Neither report that they experienced learning difficulties in school. Mr. and Mrs. Smith have lived in the same community (Omaha) throughout their lives.

A developmental history was conducted with Mrs. Smith on 11/10/02. Two children besides Billy live in the home: Mary, age 10 and Ann, age 8. Mary and Ann are Billy’s biological siblings. Parent information and a review of school records reveal that both Mary and Ann are doing well in school.

Billy speaks only English, which he has been exposed to since birth and has been speaking since he first began talking. Mrs. Smith’s pregnancy with Billy, and her delivery, were unremarkable. Billy was born through a Cesarean section, as were his two siblings. However, Billy weighed less than 5 1/2 pounds at birth. His one-minute Apgar score was moderately depressed (score = 7), but the five-minute Apgar was in the healthy range (score = 8). Billy has never been hospitalized, and with the exception of measles, he experienced no childhood illnesses.  He currently is taking no prescription medications.

A visual screening was conducted by the school nurse on 10-10-02. Results revealed Billy has normal visual acuity. Also, a hearing test was conducted in school by the speech therapist on 10-20-02 and showed normal auditory acuity. 


According to his mother, Billy reached his motor milestones (sitting alone, crawling, standing alone, and walking) within the expected age ranges. Mrs. Smith is concerned because he reached his language milestones later than expected (speaking first words and speaking in short sentences). Mrs. Smith describes Billy as a happy, cooperative child who gets along with his two, older sisters. There are many children in Billy’s neighborhood. Mrs. Smith also indicated that Billy prefers playing with children younger than himself than with either his sisters or children his own age. Billy’s favorite activity is playing with trucks. His favorite food is ice cream.


Billy attended a preschool program at age 4 and a half-day kindergarten program last year. In addition, Billy’s first grade is in the same school (Happy Valley Elementary) as his kindergarten class. A review of school records shows he is maintaining good attendance this year and he had an excellent attendance record in kindergarten. A teacher interview was completed with Mrs. Hopkins, Billy’s current teacher. Mrs. Hopkins reports that Billy is very well-behaved. Likewise, Billy has an exemplary conduct record. Regarding academic performance, Mrs. Hopkins indicates that Billy tries very hard in class. At the same time, he is struggling and experiencing many academic difficulties. He is having problems with introductory reading and math skills, and in both areas, he is in Mrs. Hopkins lowest teaching groups. School records show the same pattern of academic performance was present in kindergarten. In October, standardized group-achievement tests were administered to all first graders at Happy Valley Elementary School. Results disclose Billy scored far below average in Reading, Math, and Language.


Several pre-referral interventions were attempted with Billy. For instance, Mrs. Hopkins provides one-to-one instruction whenever possible. Billy receives one-to-one tutoring from a community volunteer twice a week for one-half hour. Likewise, the school has a peer tutoring program. Once a week, Billy works with a fourth-grade student who helps him with sight-word identification.

Current Observations:


Billy was evaluated on two occasions in his school. Physically, he presented as appropriate in height and weight for his age. Billy’s dress was clean, and on each occasion, he was well groomed. It is obvious that Billy is well cared for at home. His articulation was clear, and his vision, hearing and gross-motor coordination appeared appropriate. He was somewhat nervous about leaving the classroom to work with the examiner. Nevertheless, Billy grew increasingly relaxed as the first test session progressed; he was cooperative; he regularly helped the examiner put away test materials; and he listened attentively to most test directions and questions. Similarly, Billy was equally relaxed and cooperative during the second test session.


Wechsler Intelligence Scale for Children – Third Edition (WISC-III)


One test administered to Billy was the WISC-III. This instrument evaluates a variety of abilities associated with school success and it is considered to be one of the best predictors of future achievement. The WISC-III does not assess all abilities such as some specific mechanical aptitudes that may be important to certain occupations and trades. Likewise, it does not measure creativity or how well children get along with others.


The WISC-III provides a progression of scores that can be thought of as forming a triangle. At the top is the Full Scale IQ (FSIQ). This is the best single predictor of school achievement on the WISC-III. Underlying the FSIQ are two scores that permit further distinctions. The first is the Verbal Scale IQ (VIQ). It assesses the ability to think in words and apply language skills and verbal information to solve problems. The second is the Performance IQ (PIQ) which requires fewer verbal skills. It evaluates the ability to think in terms of visual images and manipulate them fluently with relative speed. Another way to think of the PIQ is that it evaluates the ability to organize visually-presented material against a time limit. When there is a difference between the VIQ and PIQ, the VIQ is usually the better predictor of school achievement.


Results from the WISC-III indicate Billy may have difficulty keeping up with peers on most tasks requiring age-appropriate thinking and reasoning. His general cognitive ability is within the lower extreme range of intellectual functioning (WISC-III FSIQ = 65).

Billy's ability to think with words is comparable to his ability to reason without the use of words (VIQ = 67, PIQ = 68). Both Billy’s verbal and nonverbal reasoning abilities are in the lower extreme range and align with his overall ability level. 


A personal strength for Billy is his ability to process simple information quickly and efficiently (Processing Speed Index [PSI] = 80). Billy’s PSI was his highest result on the WISC-III. The PSI converts to performance at the ninth percentile. In other words, Billy is able to process simple information more quickly than 9 out of every 100 children his age.


Adaptive Behavior Assessment System (ABAS)


Billy’s adaptive functioning skills were assessed to determine his level of social and daily living skills. His mother completed the ABAS during the interview. The ABAS assesses an individual’s personal and community independence, as well as aspects of personal development outside the school setting across 10 areas. These areas include communication skills, self-direction, social interaction skills, health and safety awareness, etc. The 10 skills form a composite and are collectively referred to as “adaptive behavior”.


Results of the ABAS suggest that Billy has limitations in several adaptive skills. Results of this assessment indicate that his skills for personal care including eating, dressing, and bathing are a personal strength (Self-Care standard score = 85). Another personal strength is Billy's skills for independence and responsibility, such as starting and completing tasks, following time limits and directions, and making choices (Self-Direction standard score = 85). However, when compared to same age peers without disability, Billy's speech, language, and listening skills (Communication standard score = 60) and his skills needed for social interaction (Social = 60) appear to be somewhat limited. Billy's mother noted that his vocabulary is restricted compared to other children his age and that he has fewer friends. Likewise, Billy's overall level of adaptive behavior was in the lower extreme range (Composite standard score = 70). The latter three scores, and most of Billy's adaptive behavior results, are commensurate with his overall cognitive functioning, as measured by the WISC-III FSIQ.


Wechsler Individual Achievement Test – Second Edition (WIAT-II)


Billy completed the WIAT-II, which is an individually administered achievement test. The WIAT provides information about children's reading, mathematics, and language performance. Additionally, Billy's teacher provided detailed information on his current academic performance.


Billy's highest level of achievement functioning took place in pre-reading skills. His Word Reading results were at the 4th percentile. He demonstrated evenly developed pre-reading skills and identified some beginning and ending sounds for a few common words (e.g., hat), but had trouble with others (e.g., fish, dish,  star). He did not read any common words and had difficulty using phonetic knowledge to sound out nonsense or unfamiliar words (Pseudoword Reading = below 1st percentile).


Similar to his reading results, Billy's language skills were in the lower extreme range (Oral Language Composite = 1st percentile). In formal testing, he correctly identified pictures of many common objects when presented singly. But, when asked to describe scenes which contained many objects, Billy had difficulty naming more than one or two. His teacher reported that Billy can identify all primary colors, but has difficulty identifying common geometric shapes. The teacher also indicated that Billy occasionally has trouble following orally-presented directions, but that he has improved significantly since the beginning of the year.


Billy's lowest level of achievement functioning appears to be in the mathematics area, where his performance was measured in the lower extreme range (Mathematics Composite = less than 1st percentile). Billy wrote all of the single digit numbers presented; he counted objects up to 10; and he compared shapes according to size. However, he was unable to add 1+2. Billy did not correctly tell time, measure with a ruler, or do subtraction. His teacher reported that Billy identifies numbers and counts up to 10 consistently, but his performance becomes less consistent with higher numbers. She also noted that Billy performs simple addition using his fingers with adult help, but cannot do so for subtraction.


Billy's skills in reading, language, and math were measured commensurate with his estimated cognitive ability.


Social and Emotional Functioning


Billy’s behavior, as rated by his teacher, reflects adequate functioning in most areas. His behavior at school, as measured by the Adjustment Scales for Children and Adolescents (ASCA), was estimated to be predominantly in the adjusted range. However, his teacher rated Billy in the Borderline range for difficulty sustaining attention (ADHD scale standard score = 60). The teacher qualified her observation by noting that this difficulty was present when Billy was in situations where the teacher was addressing the whole class. The teacher reported that Billy is consistently cooperative and has made great strides in social interaction. She noted that Billy is very capable of working independently on academic tasks. During direct observation in the classroom, Billy was measured to be on-task approximately 90% of the time; he occasionally stared off into space and was distracted when another peer whispered to a girl seated next to him.


Similarly, his mother characterizes Billy as very well-behaved and affectionate. He gets along well with his sisters and cousins, and his mother noted that Billy speaks at great length about all the fun he has at school.



Billy is approaching his 7th birthday. He is attending first grade. Billy was referred for an educational evaluation by his current teacher due to his minimal progress in attaining basic skills, including oral language and pre-reading skills.


The present evaluation suggests that Billy functions in the lower extreme range of general intellectual ability. Adaptively, delays commensurate with his measured cognitive ability were noted in several areas, including communication and social interaction. Academically, the results of formal testing indicate that Billy is performing at levels that would be expected, given his measured cognitive ability. Behavior-assessment results suggest generally appropriate levels of classroom adjustment.



Evaluated by,







Joseph J. Glutting, Ph.D.

School Psychologist

Happy Valley Public Schools






The direct numerical report of a child’s test performance is the child's raw score (e.g., the number of right-wrong answers). Most often, we cannot interpret raw test scores as we do physical measures such as height because raw scores in a psychological report have no true meaning. Likewise, raw scores are NOT measured in equal units along a line. Therefore, the way one can meaningfully talk about test scores is to bring in a referent. There are two major referents for tests: norm-referencing and criterion-referencing. We already discussed both types of referents earlier in the course. Now, as a result of the psychological report for Kelly, we will pay particular attention to instruments that facilitate norm-referenced comparisons.



The basic difference between norm- and criterion-referenced tests is their interpretation; that is; how we derive the meaning from a score. Norm-referenced tests are constructed to provide information about the relative status of children. Thus, they facilitate comparisons between a child's score to the score distribution (i.e., mean and standard deviation of some norm group. As a result, the meaningfulness of these scores depends on:

(1) the extent to which the test user (e.g., psychologist, teacher, parents) is interested in comparing a child to the mean and standard deviation of a norm group.

(2) the adequacy of the norm group.



Before we learn how to interpret Billy’s test scores, we need to learn why the norm group in norm-referenced test interpretations is so important.

The American Psychological Association (APA), the American Educational Research Association (AERA), and the National Council for Measurement in Education (NCME) (1985) clearly state that it is the test publisher's responsibility to develop suitable norms for the groups on whom the test is to be used. There are four major types of norms. Which of the four types of norms are used by a psychologist (or a school district when conducting group testing) can have a radical impact on the interpretation of a child’s test results.

1. National norms. This is the most common norm applied to test scores. Therefore, it is the most important test norm. These norms are almost always reported separately by the different age or grade levels. Most group instruments reporting national norms employ reasonably satisfactory norm groups. On the other hand, most individually-administered, clinical instruments used by psychologists, educational diagnosticians, etc., have inadequate national norms. Many have samples that are too small in size, are conducted on regional samples not representative of the country, are insufficiently stratified by age, and disproportionately represented by Anglos and middle-class children.

The WISC-III, WIAT, ABAS, and ASCA instruments used to evaluate Billy all have very good national norms. The diagnostic decisions made my MDTs should ALWAYS be based on tests that use national norms, and NOT on any of the other norms discussed below.

2. State (also called Regional) Norms. Here, the referent changes from children across the United States to those within a particular state. State norms are confusing. State norms can sometimes be helpful, however. For instance, if we wanted to compare a child's achievement level to the achievement level of other children within the state of Delaware, we would use a state norm.

Generally, state norms impose problems for interpretation. Let’s talk about an instance where state norms would not be appropriate. In the psychological report for Billy, his overall IQ on the WISC-III was 67. The overall IQ on the WISC-III is referred to as the Full Scale IQ (FSIQ). Billy’s FSIQ of 67 was determined by comparing his performance to other children across the nation (i.e., national norms were used). We would not want to compare him to just children in the state of Delaware (i.e., we would not want to use state norms) because children in Delaware could have higher, or lower, IQs than children in other states. In other words, when we think about children’s intelligence levels, achievement levels, etc., we typically think about how they compare to other children across the nation - - and not how they compare to children just within one state. If MDTs made decisions on the basis of state norms, a child could be identified as mentally retarded based on their "Delaware" IQ, move across state lines and maybe not be retarded in another state. Consequently, as noted above, national norms are to be preferred in the norm-referenced, diagnostic assessments completed by MDTs.

3. Special-Group Norms. For SOME decision-making purposes, special-group norms make sense. For example, when hiring an engineer from a homogenous pool of applicants who are all engineers applying for a job, a better decision can be made on norms based on a pool of engineers alone because we get to see how each applicant compares to the "typical" engineer. Norms based on the general population would probably fail to make the fine-grain distinctions among the engineering applicants that are necessary to make the hiring decision because engineers are brighter and more educated than the average person.

You may not know this, but the SAT uses special-group norms. Ask yourself: "Who takes the SAT"? It is only those people who are pretty successful in high school. This is a "special" group because you actually have to be pretty smart to graduate high school - - and, only those people who do well in high school consider going to college. It is only this latter group of people who take the SAT - - and norms for the SAT are based on this group. In other words, the norm group for SAT is a "special" norm group because it represents the top one-half of all students in the United States. Consequently, if you did not score very highly on the SAT, it is not an embarrassment. The reason is because you scored below average in comparison to a special norm group that was above average to begin with!

Another way of saying all of the above is that you could take the SAT (which has above-average, special group norms) and score below average. You could then take the adult form of the WISC-III (i.e., the WAIS-III) and still score above average on the WAIS-III because the WAIS-III has national norms!

On the other hand, special group norms are inappropriate for the educational or diagnostic decisions made by MDTs. For example, you probably would agree that it would be incorrect to compare Billy’s adaptive-behavior results on the ABAS by using norms only on children with retardation. The reason is because if Billy were to score in the average range for this special group, he would still share more in common with retarded children than with "regular" education students simple because he is "average" only in comparison to children who are retarded.

4. Local Norms. Many educators prefer some intradistrict norm where they can compare children to one another within their school district. These norms are referred to as "local" norms. The idea behind local norms is that test users can compare specific children to the average in that particular locale. While the use of local norms has some intuitive appeal, the procedure can be misleading when the local test mean deviates sharply from the test's national mean. For example, if the performance in a specific school district is below the national mean, the relative performance of children will be inflated by using local norms.



As we already know from our earlier lesson on statistics, the basic standard score is the z-score. We also know that once we obtain a z-score, it is a simple process to convert a z-score to a t-score, IQ score, and such.

The basic standard score, the z-score, is defined as follows:

Z = (X - M)/SD


X = a child's raw-score on the test,
M = the raw-score mean for a particular norm group,
SD = the raw-score standard deviation for a particular norm group.


The mean for a full set of z-scores is set at zero and the standard deviation is set at 1.0. Stated simply, z-scores are raw-scores expressed in standard deviation units from the mean. Further, we know that a major advantage of standard scores is that they are measured in equal units.

Problem 1

Before we go on to t-scores and other types of standard scores, let’s try a couple of problems where we convert raw scores on a test into z-scores. Assume that a test has a raw-score mean of 62 and a standard deviation of 9. If a child obtains a raw score of 71 on the test, what would her z-score be? Calculate this problem yourself.

Problem 2

Let’s return to the test used in problem 1 just above. The test has a raw-score mean of 62 and a standard deviation of 9. A second child takes the test and gets a raw-score of 53. What is this child’s z-score? Calculate this problem yourself.

I am going to give you a lot of help with problem 2 just above. The correct answer is z = -1.0. The answer shows that z-scores below the mean have negative values. In order to get enough precision when using z-scores, we must use at least one decimal place. This makes z-scores such as -1.0 awkward. Another drawback is that approximately half of all z-scores are negative.

Let’s consider again the case of Billy. He obtained a WISC-III FSIQ of 67. This number may not mean much to you yet, but it is a pretty low IQ. It is possible to convert his IQ to a z-score. When you do so, a WISC-III FSIQ of 67 converts to a z-score of -2.20. How would you like to tell Billy’s parents that his IQ was negative! I know I wouldn’t - - and its for this reason that tests use metrics.

Another way of saying all of the above is that we can avoid negative scores and decimals by simply using a standard score with a mean sufficiently greater than 0 to avoid minus score values, and a standard deviation sufficiently greater than 1 to make decimals unnecessary.

We already learned the general formula for converting z-scores to other standard scores.

Desired Metric unit = z (SD) + M

For example, Wechsler's intelligence tests (WISC-III, WAIS-III) use this form:

IQ = z (15) + 100

The SAT and GRE use this form:

SAT = z (100) + 500

Many behavior rating scales use t-scores. You can convert z-scores to this form as follows:

t-score = z (10) + 50

Problem 1

Johnny obtains a z-score of -2.0. What numerical value would his score be if we converted it to Wechsler IQ units, SAT units, and T-score units?

IQ = -2.0 (15) + 100
IQ = -30 + 100
IQ = 70

SAT = -2.0 (100) + 500
SAT = -200 + 500
SAT = 300

T score = -2.0 (10) + 50
T = -20 + 50
T = 30

As can be seen from this, IQs, SATs, and T-scores have all the properties of z-scores without the awkwardness resulting from negative scores and decimal points.

Problem 3

Mary obtains an IQ of 115 on the WAIS-III, and her SAT score is 650. On which test did she do better?

To find this out, we need to convert both scores to a common unit, the z-score. All we have to do is use the formula for a z-score.

Z = (X - M)/SD

To find the z-score, given a WAIS-III IQ of 115, we:

Z = (115 - 100)/15
Z = 15/15
Z = 1.0

To find the z-score, given a SAT score of 650, we:

Z = (650 - 500)/100
Z = 150/100
Z = 1.5

So, then, on which test did Mary do better?

We already discussed other common, standard-score metrics during our lesson on statistics. However, they are so important that I will present them again:


T score (e.g., the ASCA, CDCL, and BRP-2)

Wechsler IQ units (e.g., the WISC-III, WIAT, WRAT-III, KM-R, and ABAS)

Wechsler subtest units (e.g., the Information, Similarities, and other subscales of the WISC-III)

Stanford Binet IQ units







We need to discuss some other types of derived scores (i.e., converted from raw scores). There are several types of derived scores that give a child’s relative status. Like standard scores, these other relative-status scores are derived from raw scores. However, these other relative-status scores are not standard scores.

Remember, standard scores present everything in equal units. This means we can add, subtract, multiply, and divide standard scores. We cannot add, subtract, multiply, and divide the other types of relative-status scores.

Besides standard scores, three other types of relative-status scores are commonly used by MDTs: (a) percentiles, (b) grade equivalents, and (c) age equivalents. We will now discuss each type of relative-status score.


Percentiles. A percentile is the point in a score distribution BELOW which a certain percentage of the people fall. Thus, if a person obtains a percentile score of 50, it means that 50 percent of the population falls below this person. Likewise, if a person gets a percentile score of 75, it means that 75 percent of the population falls below this person.

Percentiles are not standard scores. The reason is because percentiles are expressed in ordinal units (ranks). All that the term "ordinal" means is that the distance between units (i.e., percentile numbers) is not equal. In other words, the distance between the 49th and 50th percentiles is much smaller than the distance between the 1st and 2nd percentiles. The reason is because the 49th and 50th percentiles are near the middle of the bell-shaped curve and the 1st and 2nd percentiles are at one "tail" of the bell-shaped curve. As strange as it may seem (and I will show you this in class), the distance between the 1st and 3rd percentiles is exactly the same distance as that between the 16th and 50th percentiles!

Although widely used, percentiles suffer from two serious limitations. One limitation is that the size of percentile units is not constant in terms of standard-score units. We just covered this limitation above, but I will repeat it again to be thorough. For example, if the distribution of test scores is a normal, bell-shaped curve, the distance between the 90th and 99th percentiles is much greater than the distance between the 50th and 59th percentiles. One standard-score unit change near the mean of a test may alter a percentile score by many units while a single standard-score unit change at the tail of the distribution may not change the percentile score at all!

A second limitation of percentiles is that gains and losses cannot be compared meaningfully because percentiles are not measured in equal units. Thus, because the units are not equal, you cannot add, subtract, multiple, or divide percentiles.

Percentile scores can be very deceiving!!! Let’s consider the psychological report for a second student, Kelly. Her standard score in mathematics on the WIAT was 86. This score converts to a percentile score of 17. A standard score of 86 is in the Average range of achievement. However, most teachers would say that a child whose mathematics score is at the 17th percentile is having big trouble academically. This simply is not the case! Yes, like her classroom teacher, the psychologist would prefer to see Kelly have a much higher achievement level. However, a score at the 17th percentile is not all that low. Psychologists know this fact. It is not until you are about the 5th percentile, or lower, that the score suggests a need for special education. Because percentiles are misinterpreted so often, I tell graduate students that, in general, it is best not to present them in their psychological reports. (Note: the psychological report did present percentiles for the case of Billy because I wanted to show you the problems they can pose.)

Standard scores are clearly better to interpret than percentiles. Furthermore, once you know a child’s standard score on a test it is relatively easy to translate the standard score to a percentile. You can use standard score-to-percentile conversion tables to do this without having to make any calculations of the sort described earlier. In class, I will show you how to use such tables. Here are four:


Age- and Grade-Equivalents

Like percentiles, age- and grade-equivalents are two other types of derived scores. However, percentiles, age-equivalents, and grade-equivalents are not standard scores.

We already know that percentile scores can cause problems for interpretation. The truth of the matter is that age- and grade-equivalents are far worse to interpret than percentiles!

Age equivalents are intended to convey the meaning of test performance in terms of the typical child at a given age. Likewise, grade equivalents attempt to provide information in terms of the typical child at a given grade level.

Grade equivalents are the most common method for reporting results on standardized achievement tests prior to high school (Echternach, 1977). Although grade equivalents are very popular, they also are very problematic. Approximately 20 years ago, the APA, AERA, and NCME proposed that they be banned. Unfortunately, this never occurred.

Age- and grade-equivalents are essentially the same thing, except that age-equivalents compare children to other children who are at the same age level, whereas grade-equivalents compare children to others at their grade level. Therefore, because grade-equivalents are more popular than age-equivalents, the rest of the document will discuss grade equivalents.

Grade-equivalents can be explained best by an example. If a student obtains a raw score on a test that is equal to the median score for all the beginning sixth-graders (September testing) in the norm group, then that student is given a grade-equivalent of 6.0. A student who obtains a score equal to the median score of all beginning fifth-graders is given a grade equivalent of 5.0. If a student should score between these two points, "interpolation" would be used to determine the grade equivalent. Because most schools run for 10 months, successive months are expressed as decimals. Thus, 5.1 would refer to the average performance of fifth graders in October, 5.2 in November, and so on to 5.9 in June.

Limitations of Grade Equivalents

Grade-equivalents have a great deal of intuitive appeal because parents, teachers, as well as many psychologists, think the numbers actually mean something. However, this is not the case. By way of example, most parents, teachers, (and many psychologists) would assume that a fifth grade child who obtain a grade equivalent of 3.2 knows the same amount of reading as a third grade child who obtains a grade equivalent of 3.2. This simply is not true! The fifth grade child actually knows more reading! Thus, this short example shows some of the problems associated with grade equivalents.

If you do not believe what I just said about grade equivalents (or even if you do), it would be worthwhile to read the question and answer section of "Hills Handy Hints" devoted to grade equivalents.

We are now going to discuss the limitations of grade equivalents, but the problems cited for grade equivalents also apply to age equivalents.

Grade equivalents suffer from at least 7 major limitations. I will now present each of these limitations

1. Grade equivalents for low-scores in the low grades and high-scores in the high grades are impossible to establish because they generally are extrapolated from existing observations. This procedure, at best, represents little more than an educated guess.

2. Even in grades where norms exist, it is appropriate for 50% of the children in a classroom to score below their grade level. That is, grade equivalents give us little information about the percentile standing of a person within a class. For example, during a September testing, it is normal for 50% of the children to obtain grade equivalents below their grade placement. This is especially true in the upper grades.

3. Related to problem number 2, grade equivalents tend to exaggerate the significance of small differences, and in this way, tend to encourage the improper use of test scores. Because of the large within-grade variability it is possible, for example, for a child who is only moderately below the median for his grade to appear on a grade equivalent as much as a year or two below expectancies. (This phenomena is most likely to occur in the upper grades - - say in the sixth grade and above.) A comparison of the child's grade equivalent with his percentile rank will make this fact clear. The problem is most evident, say, when a 6th grader obtains a grade equivalent of grade 1.3. The 1.3 grade equivalent does not mean that the child is functioning on the same level as a child in the third month of first grade. The older child most probably knows more.

4. Grade equivalents are not comparable across subject matter. A 6th-grade student, for example, because of the differences in grade equivalents for various subject matter, can have a grade equivalent of 6.6 in reading and 6.2 in mathematics and yet have a higher standard score (or percentile score) in mathematics! In other words, grade equivalents are an artifact of the particular way the subject-matter area in question is measured on the test AND the way the subject-matter is introduced in the curriculum of a particular school district.

5. Grade equivalents assume that growth across years is uniform. The assumption of uniform growth across years is untenable. Developmental psychologists teach us that rate of growth is greater for younger children and that it diminishes as children advance in age. Grade equivalents, however, act as though 1 month of growth in the first grade is the same as 1 month of growth in the 10th grade.

6. Grade equivalents are based on 9 or 10-month school-year metrics. This means grade equivalents assume that either no growth takes place during the summer or that growth during summer is equal to one month of growth during the school year. There is certainly reason to doubt that these assumptions are true.

7. Finally, grade equivalents have no interpretive value beyond the eighth or ninth grade. They are appropriate only for those subjects that are common to a particular grade level.

Grade equivalents remain popular in spite of their inadequacies. Educators are under the impression that such scores are easily and correctly interpreted - an unfortunate assumption. At a minimum, it is appropriate to suggest that grade equivalents never be used alone without some other type of score, such as standard scores or percentile ranks - - and, it may not be too dogmatic to suggest that we stop using these scores altogether.

You probably noticed that the psychological report for Billy presented neither age- or grade-equivalents. The reason, quite simply, is because their metrics are so bad.