Overheads for Unit 8--Chapter 11 (Performance-Based Assessment)

OH 1
Performance-Based Assessments: Names and General Format Common Names

Alternative assessment—connotes what it is not (not paper and pencil)
Authentic assessment—connotes realism and practical application
Performance-based assessment—more useful, according to your text, because:

More descriptive
Not pretentious
Realism is a matter of degree anyway

General Format

Poses a task to be performed (conduct and report an experiment, sing a song, recite a poem, repair an auto, etc.)
Many possible modes of response possible, but essays common
Same basic continuum as essay questions in terms of freedom of response, with restricted-response tasks at one end and extended-response tasks at the other.
No single correct answer
Expert observers must judge quality of performance or product

OH 2
Essay Questions vs. Performance-Based Assessments: Similarities and Differences Similarities

High freedom of response, therefore:

Can measure complex, important learning outcomes that other forms cannot
Not efficient for measuring factual knowledge
Scoring is time-consuming
Scoring is unreliable and prone to bias unless great care is taken

Applicable to many subjects
Can be good instructional activities
Sample little of the content domain unless cumulated over a long period

Differences

Essay is the more familiar and established form of non-objective assessment
Performance-based assessment became the hot new rage in the 1990’s
Performance-based assessments emphasize doing (process) as well as knowing (product)
Performance assessments include broader/longer activity than normal essay (planning, gathering information, doing drafts, getting critiques, etc.)
Unlike essay questions, performance-based assessments do not require a written product, so can assess more outcomes (speaking skills, auto repair, etc.)
Performance-based assessments of process have to be rated on the spot

OH 3
Performance-Based Assessments: Uses Uses

They parallel essay questions in getting at highly complex performances (analysis, synthesis, evaluation)
Can go beyond essay questions to assess non-writing performances and products
As with essays, must carefully specify learning outcomes and construct tasks that actually call them forth

OH 4
Performance-Based Assessments: Forms, Focus, and Authenticity
Form: Restricted- vs. Extended-Response

They parallel the two forms of essay questions

Restricted-Response Tasks
1. Intended performances more narrowly defined than on extended-response tasks
2. (Sometimes) question may begin like a multiple-choice or short-answer stem, but then asks for an explanation, justification, etc.
3. (Sometimes) may have introductory material like an interpretive exercise, but then asks for an explanation of the answer, not just the answer itself
Extended-Response Tasks

Activities for single assessment may be multiple and varied (gather data or information, analyze it, and write a report)
Activities may extend over a period of time (a series of drafts and revisions)
Products from different students may be different in focus (different forms of music, different research topics, etc.)

Focus: Process vs. Product--Note: More here than in your textbook

One or both may be assessed, depending on the learning outcome and stage of instruction

Good procedure is emphasized when:

There is no product
The procedure is orderly and directly observable
Correct procedure is crucial to later success
Analysis of procedural steps can aid in improving product
Learning is at an early stage

Good product is emphasized when:

Different procedures result in an equally good product
Procedure is not available for observation
Procedural steps have been mastered
Product has qualities that can be identified and judged

Degrees of authenticity--Note: More here than in your textbook

Realism of tasks can range widely

But all are performed under controlled conditions

From least realistic
1. Paper-and-pencil exercise (plan a garden on paper)
2. Observe and identify real objects and processes (tools or product flaws)
3. Perform an isolated procedure (adjust a microscope, weld a joint)
4. Simulated performance of part or whole real-world activity (mock interview, flight simulator)
5. Work sample (drive a car, repair an engine, give a concert)
To most realistic

OH 5
Performance-Based Assessments: Advantages and Limitations

Advantages

Can assess complex learning outcomes not measured by other means
Can assess process as well as product
Can clearly communicate instructional goals that relate to real-world skills
Can constitute good instruction, not just assessment
Implement new "constructivist" approaches to learning and self-evaluation
Engaging (more active and realistic)

Limitations

Scoring often unreliable
Time-consuming
May be costly (if authentic)
Provide only narrow sampling of learning outcomes

OH 6
Performance-Based Assessments: Suggestions for Constructing Tasks

Suggestions

Focus on learning outcomes that require performance-based assessments
Make sure that skills called forth apply to relevant content
Minimize dependence on irrelevant skills (that is, irrelevant difficulty)
Provide necessary scaffolding to understand task and expectations
Give task directions that make students’ task clear (providing them freedom is no excuse for your vagueness!)
Clearly communicate performance expectations with scoring rubrics because it:

clarifies the task
provides guidance on proper focus in responding
conveys learning priorities

OH 7
Performance-Based Assessments: The Performance Criteria

Performance Criteria are Absolutely Crucial

Some experts say that clear and appropriate performance criteria are the best way to assure valid performance-based assessment
Like essays, performance-based assessments require scoring rubrics (analytical or holistic)
Unlike essays, they may also require rating scales to assess live performances
Performance criteria should be created before assessment is given

OH 8
Performance-Based Assessments: Types of Rating Scales

Checklists

A list of key qualities of a product or process
Rater checks each as present or absent, correct or not, completed or not, etc.

Numerical Rating Scales

A series of numbers (e.g., 1-5) used to rate some characteristic (thesis, explanation, etc.) by quality (poor to excellent)
Useful when number of rating levels is limited (3-7)
Useful when rating levels are clearly defined and agreed-upon (usually not)
Meaning clearest when rating levels are "behaviorally anchored" (have descriptions for each level)

Graphic Rating Scales

Horizontal line with rating levels (never, sometimes,…; inappropriate, somewhat appropriate, ….) ranged across it
Rater can check anywhere along the line
Most useful when rating categories are limited and "behaviorally anchored" (have descriptions for each level)

OH 9
Performance-Based Assessments: Common Rating Errors

"Personal Bias Errors"—Didn’t use the whole scale

Generosity error (***)
Central tendency error (**)
Severity error (*)

Scores may reflect rater as much as student
Little variance in the scores

"Halo Effect"—A general impression colors all achievement ratings for a student

Obscures student’s strengths and weaknesses
Is a form of prejudice

"Logical Error"—Preconception that two characteristics (e.g., good behavior and high intelligence) do--or don't--go together—

Won’t get separate, valid ratings of different strengths and weaknesses

OH 10
Performance-Based Assessments: Principles of Effective Rating

Can reduce rating errors if:

Characteristics to be rated (in your rubrics) represent specified learning outcomes
Each rating scale describes the level of learning desired for an outcome
Each characteristic is directly observable
Characteristics and points on rating scale are clearly defined
Type of scoring rubric chosen (analytical vs. holistic) is the most appropriate for task and purpose
Use 3-7 rating positions
Rate all students on one task before rating the next task
When possible, avoid knowing student’s name
When consequences are important, use several (diverse) raters and average their scores

Can instruct and motivate students if they help design performance criteria and use them to rate their own progress