The Case for Tests

The provincial government of Alberta has decided that it’s time to move on a curriculum review/redesign – a process already started under the former Conservative government. This is big news here, because it appears that this will be an all-encompassing process with far-reaching implications, not the least of which include an explicit focus on student-centered learning and cross-curricular competencies. These ideas sound nice in theory, but they rarely translate into anything tangible in practice. Further, Minister Eggen has vowed to collaborate with all stakeholders, including the Alberta Teachers’ Association. Again, this sounds reasonable, but the cacophony of voices may well lead to unintended consequences. However, I’ll reserve my judgement until I see something more concrete released. Perhaps they’ll get it right. We’ll have to wait and see.

One aspect of this review/redesign may include the abolition of our provincial achievement tests and diploma exams. Of course, this is still conjecture, but given the ATA’s position on such accountability measures, there’s a chance that the structure, implementation, and even existence of these exams will be affected.

As I’ve written before here and here, these exams are valuable tools that not only serve as periodic checks into the “provincial classroom,” but they ensure that key aspects of the curriculum in a given subject area have been taught and, hopefully, learned. Our testing context in Alberta does not remotely resemble some of the horror stories we see in the U.S. Nobody’s job is attached to students’ test scores. While we do have a “ranking report” produced by the Fraser Institute, the vast majority of parents send their kids to the community school or to the school that offers the programs they want anyway; few schools are meaningfully affected by the Frasier Institute’s ranking report. I’ve yet to see any tangible evidence that this ranking report is anything other than a political talking point.

We now have three provincial exams in twelve years of elementary and secondary schooling. Is that too much? I would vehemently argue that it is NOT too much. The PATs in grades 6 and 9 don’t even need to factor into students’ grades; it’s up to the teacher or school administrator whether or not to include the calculation of the PAT score into his or her students’ final marks. In grade 12, the final diploma exam has been reduced from a 50% weighting to a 30% weighting. This doesn’t make or break a student’s achievement in a course. And, again, teachers’ careers are not remotely connected to these test scores. Anecdote alert: My students always score well above provincial average and my discrepancy rates are low. This is rarely recognized by administration, other than perhaps privately, in passing. I’ve never been in a staff meeting where diploma scores were an agenda item, and every school has some good, bad, and ugly exam records. In my decade as a teacher, it seems the only one who cares about the diploma scores is me, and the teachers whose students will write these exams.

The point of this post will be to refute some of the “alternatives” to testing. I’ll not spend too much time defending Alberta’s achievement tests, other than to say they are reliable and valid, constructed by seconded teachers and subject specialists. Teachers who mark the written components are rigorously trained, combining elements of comparative judgement and rubric-referenced assessment, with checks and balances in place. While these exams admittedly do not assess the whole curriculum, they do demand a demonstration of comprehensive knowledge, and the application of it, wherein most of the outcomes that are not explicitly on the test are at least factors in the process of getting “the answer.” For more on assessment, see David Didau’s blog at and Daisy Christodoulou’s blog at

I have no rebuttal to those like Alfie Kohn who argue for the abolition of grades entirely; this is beyond my frame of reference and I think those who agree with Kohn are coming from entirely different philosophical and political dimensions than the ones in which our society is structured. For those who agree that there should be structures in place to monitor and enhance education in schools, here are some of the most common alternatives presented to testing.

1: “A sample approach. The same tests, just fewer of ’em. Accountability could be achieved at the district level by administering traditional standardized tests to a statistically representative sampling of students, rather than to every student every year.”

This isn’t a bad idea – in theory. Mathematically, as an accountability measure, sampling would achieve the same goal. However, in reality, every teacher knows that at the mention of assessment, several hands shoot up to ask, “Is this for marks?” I have my own way of dealing with that issue, but the reality is that if there are no “stakes,” not all students will take the exam seriously. It’s like the sample group is doing a favour for the ministry to track achievement in the province. Students’ levels of engagement in the exam would be comparable to their levels of engagement in a survey. We live in a “what’s in it for me?” culture, and if there’s nothing “in it” for them, the validity of these sampled scores would be compromised.

2: “Stealth assessment. Similar math and reading data, but collected differently.
The major textbook publishers, plus companies like Dreambox, Scholastic and the nonprofit Khan Academy, all sell software for students to practice math and English. These programs register every single answer a student gives.
The companies that develop this software argue that it presents the opportunity to eliminate the time, cost and anxiety of “stop and test” in favor of passively collecting data on students’ knowledge over a semester, year or entire school career. Valerie Shute, a professor at Florida State University and former principal research scientist at ETS, coined the term “stealth assessment” to describe this approach.”

Again, this isn’t a bad idea, as an ongoing assessment option. But the limitations with this are the same as the limitations with standardized achievement testing, it’s just another medium. The real issue here becomes one of privacy. Do we want our students’ data to be mined by corporations whose goal is to sell to them? Do we want our schools’ data mined by corporations with various interests? How would the collection of this data impact our society? If you think the Fraser Institute ranking report is bad, it would pale in comparison to the possibilities here.

3: “Multiple measures. Incorporate more, and different, kinds of data on student progress and school performance into accountability measures.”

This is already happening in classrooms in Alberta. Of course we don’t base a student’s achievement on a single test. Of course multiple measures are used throughout the year to assess a wide range of competencies using a wide range of methods. Suggestions in the article include social and emotional skills surveys, game-based assessments, and performance or portfolio-based assessments. Fine – include them all. This doesn’t negate the need for, or benefit of, traditional achievement testing. This is because we still need a more objective measure of achievement rather than simply a teacher’s subjective judgement. It’s all well and good to say that teachers are professionals and that their professional judgement should be respected; however, teachers are also human beings subject to biases, preferences, and partialities. I wouldn’t want my subjective judgement to be the sole factor in determining a student’s achievement. I welcome outside objective measures that serve to balance whatever flaws I may have inadvertently perpetuated in assessing my students.

When we mark the written component of the English Language Arts diploma exam, we spend almost an entire day working in groups to train for the task. The process begins a week before, with several Standards Confirmers selecting exemplars in every category of the rubric. They discuss these randomly selected papers and identify several dozen to serve as “hinges” when the rest of us come in to mark the 15 000 or so exams. I’ll sit at a table with five or six other teachers, and we begin by reviewing these standards, as they relate to the topics and texts for the marking session. We discuss our scoring and attempt to resolve any discrepancies in interpretation with practice papers. Reliability Reviews are conducted daily. If I have problems over the six or seven-day session, I bring the paper to the Table Leader, who clarifies the issue for me, or who passes it along to the original Standards Confirmers. Each paper is blind-marked by two different markers. If the discrepancy exceeds 10% or more than one level in one or more scoring categories, it goes to a third marker. I detail this here to demonstrate the rigorous standard according to which these exams are marked.

In my classroom, it’s just me. Teaching can be an isolating profession. Days and days can pass without speaking to another adult, particularly if one is teaching a full course load. If an assignment presents a problem, maybe I can ask the opinion of another teacher in the school, but many schools in the province employ only one English teacher, experienced or otherwise. I can employ all kinds of multiple measures, but the bottom line is that it’s still just me assessing them. Standardized objective tests provide the oversight to balance the flaws in subjective judgement. For more on this increasingly popular area of study, see Daisy Christodoulou’s thoughts here.

4: Inspections. Scotland is a place where you can see many of the approaches above in action. Unlike the rest of the U.K., it has no specifically government-mandated school tests. Schools do administer a sampling survey of math and literacy, and there is a series of high-school-exit/college-entrance exams that are high stakes for students. But national education policy emphasizes a wide range of approaches to assessment, including presentations, performances and reports. These are designed to measure higher-order skills like creativity, students’ well-being and technological literacy as well as traditional academics. Schools and teachers have a lot of control over the methods of evaluation.

I’m actually a fan of the idea of inspections, despite the myriad problems associated with this accountability measure, as well. Like any system, including testing, flaws need to be identified and rectified. In the above example, Scotland is cited. Coincidentally, a Scottish teacher was recently fired for being “too boring,” according to the inspection evaluation. Such examples are likely the reason the ATA abolished observations years ago, and why it would likely not support something similar in lieu of standardized achievement testing. Further, Scotland’s education system, which employs many of the above approaches, as noted in the NPR article, is not exactly revered. Their results are dropping – although without objective tests, I suppose we wouldn’t know this and everyone could just cheer about how great they are, no matter the country, province, or system.

The bottom line is that education is a system for which we, as a society, pay. Any system needs to have measures for accountability and oversight embedded. Standardized achievement testing serves this purpose, to some extent, among other purposes more connected to teaching and learning. Even Alberta’s Valhalla – Finland – has the National Matriculation Exam, a battery of tests in at least four subject areas. “Student musts complete all required tests of the examination within three consecutive exam periods of up to six hours each. All tests, except listening and reading comprehension in second domestic and foreign languages, are pencil-and-paper tests, typically requiring extensive writing in open-ended tasks.”  While Finland is known for its progressive approach to education, absent of high stakes testing throughout school, this series of exit exams is as high stakes as you can get; all students MUST pass them to graduate.

Most education systems accept that standardized testing benefits teaching and learning. In the UK, students write GCSEs as a requirement for graduation and A-Level exams for further education in university preparation. In France they write the Baccalaureat, in Germany, the Abitur, in other parts of central and Eastern Europe, the Matura, in Israel, the Bagrut, in South Africa, the Matric. They’re all high-stakes, with weightings ranging all the way up to 100%. Our little diploma exam in Alberta is worth 30% of a student’s final grade. That still leaves 70% to be determined by the classroom teacher. The other two achievement tests in grades 6 and 9 may not even be calculated in a student’s grade. Surely this is not the problem it’s being made out to be. Surely not everyone is blind to some truth regarding the evil of exams. Surely worldwide recognition of the value of standardized achievement testing suggests that we’ve been on to something for a while.

Ultimately, we cannot dispense with standardized achievement testing unless and until we have something to replace its value in teaching and learning and as a measure of accountability and oversight. I would bet that there may not be a “better” metric – at least I haven’t seen one yet.