Form: Restricted- vs. Extended-Response
- They parallel the two forms of essay questions
- Restricted-Response Tasks
- Intended performances more narrowly defined than on extended-response tasks
- (Sometimes) question may begin like a multiple-choice or short-answer stem, but then asks for an explanation, justification, etc.
- (Sometimes) may have introductory material like an interpretive exercise, but then asks for an explanation of the answer, not just the answer itself
- Extended-Response Tasks
- Activities for single assessment may be multiple and varied (gather data or information, analyze it, and write a report)
- Activities may extend over a period of time (a series of drafts and revisions)
- Products from different students may be different in focus (different forms of music, different research topics, etc.)
Focus: Process vs. Product--Note: More here than in your
textbook
-
One or both may be assessed, depending on the learning outcome
and stage of instruction
- Good procedure is emphasized when:
- There is no product
- The procedure is orderly and directly observable
- Correct procedure is crucial to later success
- Analysis of procedural steps can aid in improving product
- Learning is at an early stage
- Good product is emphasized when:
- Different procedures result in an equally good product
- Procedure is not available for observation
- Procedural steps have been mastered
- Product has qualities that can be identified and judged
Degrees of authenticity--Note: More here than in your
textbook
- Realism of tasks can range widely
- But all are performed under controlled conditions
- From least realistic
- Paper-and-pencil exercise (plan a garden on paper)
- Observe and identify real objects and processes (tools or product flaws)
- Perform an isolated procedure (adjust a microscope, weld a joint)
- Simulated performance of part or whole real-world activity (mock interview, flight simulator)
- Work sample (drive a car, repair an engine, give a concert)
- To most realistic
OH 5
Performance-Based Assessments: Advantages and Limitations
Advantages
- Can assess complex learning outcomes not measured by other means
- Can assess process as well as product
- Can clearly communicate instructional goals that relate to real-world skills
- Can constitute good instruction, not just assessment
- Implement new "constructivist" approaches to learning and self-evaluation
- Engaging (more active and realistic)
Limitations
- Scoring often unreliable
- Time-consuming
- May be costly (if authentic)
- Provide only narrow sampling of learning outcomes
OH 6
Performance-Based Assessments: Suggestions for Constructing Tasks
Suggestions
- Focus on learning outcomes that require performance-based assessments
- Make sure that skills called forth apply to relevant content
- Minimize dependence on irrelevant skills (that is, irrelevant difficulty)
- Provide necessary scaffolding to understand task and expectations
- Give task directions that make students’ task clear (providing them freedom is no excuse for your vagueness!)
- Clearly communicate performance expectations with scoring rubrics
because it:
- clarifies the task
- provides guidance on proper focus in responding
- conveys learning priorities
OH 7
Performance-Based Assessments: The Performance Criteria
Performance Criteria are Absolutely Crucial
- Some experts say that clear and appropriate performance criteria are
the best way to assure valid performance-based assessment
- Like essays, performance-based assessments require scoring rubrics
(analytical or holistic)
- Unlike essays, they may also require rating scales to assess live
performances
- Performance criteria should be created before assessment is given
OH 8
Performance-Based Assessments: Types of Rating Scales
Checklists
- A list of key qualities of a product or process
- Rater checks each as present or absent, correct or not, completed or not, etc.
Numerical Rating Scales
- A series of numbers (e.g., 1-5) used to rate some characteristic (thesis, explanation, etc.) by quality (poor to excellent)
- Useful when number of rating levels is limited (3-7)
- Useful when rating levels are clearly defined and agreed-upon (usually not)
- Meaning clearest when rating levels are "behaviorally
anchored" (have descriptions for each level)
Graphic Rating Scales
- Horizontal line with rating levels (never, sometimes,…; inappropriate, somewhat appropriate, ….) ranged across it
- Rater can check anywhere along the line
- Most useful when rating categories are
limited and "behaviorally anchored" (have descriptions
for each level)
OH 9
Performance-Based Assessments: Common Rating Errors
"Personal Bias Errors"—Didn’t use the whole scale
- Generosity error (***)
- Central tendency error (**)
- Severity error (*)
Problem?
- Scores may reflect rater as much as student
- Little variance in the scores
"Halo Effect"—A general impression colors all
achievement ratings for a student
- Obscures student’s strengths and weaknesses
- Is a form of prejudice
"Logical Error"—Preconception that two characteristics (e.g.,
good behavior and high intelligence) do--or
don't--go together—
- Won’t get separate, valid ratings of different strengths and weaknesses
OH 10
Performance-Based Assessments: Principles of Effective Rating
- Can reduce rating errors if:
- Characteristics to be rated (in your rubrics) represent specified
learning outcomes
- Each rating scale describes the level of learning
desired for an outcome
- Each characteristic is directly observable
- Characteristics and points on rating scale are clearly defined
- Type of scoring rubric chosen (analytical vs. holistic) is the most
appropriate for task and purpose
- Use 3-7 rating positions
- Rate all students on one task before rating the next task
- When possible, avoid knowing student’s name
- When consequences are important, use several (diverse) raters and average their scores
- Can instruct and motivate students if they help design performance criteria and use them to rate their own progress