New grading models are based on the premise that a standards-based system requires that scores on assessments be both valid and reliable.
Many believers in standards-based grading claim a score is valid when the score represents a student’s performance on a standard and that a score is reliable when students who demonstrate the same understanding of a standard receive the same score.
This is all very “scientific” sounding. How much real science is involved, though?
Reliability is the central issue—so what makes an assessment reliable? Well, a measurement made once should be replicable; if we are measuring distance, we use a tool that accurately measures it, like a ruler. If we are measuring time, we might use a “reliable” stopwatch. Doing so should make replicating measurements simple and practical. Both distance and time can be measured reliably because we agree that each has an objective reality and a corresponding system for its measurement.
So what are we measuring at school? I assume we are measuring student learning.
I’ve never seen an objective measurement of learning. Learning is not an objective construct. It has been operationally defined by some as “What you know and can do,” but a remarkable number of educators and psychologists reject that notion as a convenient reductionist fallacy. That definition is based on the belief that learning is an outcome. Learning can be described in a variety of ways—as open-ended inquiry, for example. If we view learning as primarily a process or journey that has no distinct conclusion, then we ought to accept that learning is best judged by the unexpected outcomes that accompany it—and we will reject authoritarian notions that outcomes conform to pre-ordained “standards” or “proficiency scales.”
Since it isn’t possible to measure or observe learning directly, it must be inferred from the performance of students. We give tests; we ask students to respond to questions or to solve problems or perform some skill relevant to a content area. We create scales to clarify how performance is to be evaluated—in the hope and belief that we’ll all “measure learning” reliably. We create “common formative and summative assessments” so that students can be “guaranteed” a specific experience and outcome.
But it doesn’t work.
Human beings seldom see things in exactly the same way as other humans. Tools like proficiency scales are subject to variable interpretation and application. Humans make subjective judgments of non-objective realities, using measuring tools that are blunt instruments. Judging complex thought and performance—even with established standards—is a messy, awkward process that ought to caution against certainty, guarantees, or claims of “objectivity.” Taking the illogic of standards-based evaluation to its nadir, proponents also want to use these unreliable scores for evaluation of a second party—teachers themselves.
In the end, education itself is inherently un-reliable. That’s one reason it holds fascination for both teacher and student. Acknowledgement of this fact may free us from the discomfort that trying to shoehorn objective measures into a subjective framework produces.
And if reliability eludes us, validity is out the window as well.
Just ask a scientist.
We’d be better off with an honestly un-reliable system.
© David Sudmeier, 2014