The key to rigorous online assessments

By:

Jun 12, 2015

Although online-learning software can be a powerful enabler of personalized learning, many educators struggle with what they see as learning software’s limited ability to provide rigorous assessment and feedback.

I spoke recently with the principal of a charter middle school that is using blended learning and alternative staffing to try to develop a breakthrough personalized learning model. The principal was excited by what he saw as the future potential of blended learning, but admitted that he and his staff have been reluctant to move the school’s core instruction online because they just haven’t found any online curriculum that is as good as the lessons and learning activities that their expert teachers create. For example, he explained that his math teachers create rigorous, multi-step word problems that require students to “explain their thinking” using writing and drawings. In contrast, the best learning software they were aware of for the grades and content they teach is limited to multiple-choice questions, numerical inputs, or basic demonstrations with digital manipulatives.

Currently, teacher-created assessments are by far the most common instruments for measuring students’ learning and providing feedback on a day-to-day basis. But there are three common drawbacks to teacher-developed assessments: 1) they are expensive, 2) they are slow, 3) they lack validity and reliability. But if we unbundle the role of the teacher and leverage the benefits of technology, there may be a way to integrate human-graded assessments into online learning software in a way that both provides rigor and circumvents these drawbacks.
 
Recognizing the cost of human-graded assessment
There’s a good reason why standardized tests—such as the SAT, ACT, and state end-of-year exams—are usually composed of a long string of multiple choice questions followed by only a small handful of constructed response questions. Grading constructed response items by hand is expensive and time intensive.

Schools systems do not seem to have it in their budgets to pay for rigorous, hand-graded assessments from third-party providers. But the truth is, we already pay for expensive assessments every day in our schools whenever teachers grade students’ work. We never consider the costs of these forms of evaluation, however, because their costs are lumped into the overall salary of a teacher and become indistinguishable from the costs of other activities such as preparing lesson plans, delivering lessons, taking attendance, and monitoring the schoolyard during recess.

If we unbundle teachers’ roles so that some adult educators focus on instruction while others are dedicated exclusively to developing, administering, and scoring assessments, we could start being more deliberate and purposeful in how we allocate resources for ongoing assessment.
 
Speeding up feedback
Humans are slow graders compared to computers. But in schools, the biggest lag in providing students with feedback stems from the fact that teachers cannot teach and grade at the same time. When a student turns in an assignment or test, it usually sits in a teacher’s inbox until the end of the day when the teacher is done teaching and finally has time to grade it. This means students get feedback the next day at the earliest, but often a few days, week, or even months later if the teacher is busy. Unfortunately, the longer the lag between when a student submits work and receives feedback, the less useful that feedback is for helping students learn.

In contrast, if we unbundle the teacher’s role to separate learning and assessment, we can have human graders who are dedicated full-time to grading students’ work as soon as it is submitted and who become far more efficient at grading because they don’t deal with the time costs of switching between different teaching roles.
 
Addressing reliability and validity
To most people, the term “standardized test” means multiple-choice, high-stakes test. In reality, however, what makes a test “standardized” is that it was carefully developed to ensure validity (i.e., it actually measures what it purports to measure) and then administered and scored in a standardized manner in order to help ensure the reliability (i.e., it produces consistent results). In other words, standardized does not necessarily mean multiple choice or high stakes. A standardized test can include essay questions, performance assessments, or nearly any other type of assessment item as long as the assessment items are developed, administered,and scored in a way that ensures validity and reliability. For assessment items that require human grading, standardization usually means that the assessment item is scored using a well-developed rubric by graders who are trained so as to have inter-rater reliability.

In contrast to standardized assessment items, the assignments and tests that most teachers create and then use in their classrooms are often far from being valid or reliable. This is not the fault of the teachers. Rather, it is because most teachers do not have the time or the training to develop assessments that meet high bars of validity and reliability. Fortunately, if we unbundle teaching and assessment and integrate human-graded assessments into online learning platforms, it makes sense to invest in developing a common set of rigorous, standardized, human-graded assessment items. Indeed, if we expect instructors and students to trust the results of hand-graded, online assessment items, the validity and reliability that come from standardization will be important for giving the assessment items credibility and currency.
 
Bringing it all together
Now imagine what this might look like in practice. Students come to school and learn through a variety of face-to-face and online activities. As they learn, they are given opportunities to practice and demonstrate their learning and receive feedback on an ongoing basis. When they complete learning activities that require them to use basic factual or procedural knowledge, software evaluates their performance and provides immediate feedback. When they complete learning activities that require deeper levels of understanding, analysis, and critical thinking, the learning platform captures their performance (in video, audio, written, or other formats) and immediately sends it to expert graders who score their work and provide feedback to help the students improveme. Then, as students progress through the platform’s learning activities, the results from both the machine-graded and human-graded standardized assessment items are incorporated to create a complete and robust picture of the students’ mastery of learning standards.

There are still many details that need to be figured out in order to turn the rough ideas articulated above into a working model. But I’m optimistic about the potential of unbundling the role of the teacher and leveraging technology to create an online system for measuring and tracking student learning growth that has the rigor of human-graded assessment, the advantage of quick feedback cycle times, and the validity and reliability that come from standardization.

Thomas Arnett is a senior research fellow for the Clayton Christensen Institute. His work focuses on using the Theory of Disruptive Innovation to study innovative instructional models and their potential to scale student-centered learning in K–12 education. He also studies demand for innovative resources and practices across the K–12 education system using the Jobs to Be Done Theory.