It's Time to Fix Standardized Testing

The global pandemic wreaked havoc on the educational status quo and disrupted not only how students are taught and tested. Schools radically adjusted how they deliver lessons and testing agencies revamped how they deliver assessments. The inability to gather in a room forced the cancellation of statewide assessments, the shortening of AP exams, and major admissions tests to develop online at-home options. This was the moment to fix many of the things that are broken about assessments.

Since their beginnings in the late 1800s, standardized tests have become an oppressive force in U.S. education and have influenced many other areas of society, yet in that same time there have been only marginal changes in the tests themselves. Despite reams of papers and scores of conference presentations, a high score on a 4 or 5 answer-choice aggressively-timed test continues to be treated as the epitome of intellectual demonstration.

Test-making agencies and their defenders have claimed that standardized tests are "objective" and "demonstrate readiness," but critics ask "readiness to do what?" Questions about passage content, question design, reading load, and gender biases have plagued these tests, as have scoring inconsistencies, administration errors, and repeated cheating scandals.

These problems and scandals led to a surge in colleges adopting test-optional policies, parents opting out, and most recently graduate schools joining the GRExit. With the current global crisis further highlighting problems with equitable access, it's long past time for test makers to stop defending an outdated system which perpetuates inequities and develop tools that actually support learning. Here are a few suggestions.

Reduce Score Scales

The number of increments on most standardized tests creates a perception of fine-tuned accuracy that research does not support. The SAT is scored on a 400-1600 point scale in 10-point increments, suggesting this test identifies 121 different levels of proficiency using a mere 154 questions. That the ACT reports only 36 different score increments yet is used to make the same distinctions compounds the problem. A reduction of score scales on both tests would align them with the performance levels that state tests provide, diminish the false precision that current score scales encourage, and reduce the impact of test preparation.

Reducing score scales would also reinforce the research undergirding the tests. Nearly all psychometricians and test-makers warn that each score is indicative of a score range, not a precise number. According to the College Board, an SAT score of 1010 is effectively the same as a 970 and as a 1050. Yet universities, districts, and scholarship-granting agencies continue to award money and admissions based on these immaterial differences.

Improve the Speed and Detail of Feedback

Given modern technology, it's extraordinarily shameful that test-makers continue to provide test-takers, families, and schools with only limited, vague information and a meaningless numeric value following a test. Results take weeks, if not months, to be returned. If an assessment is to be a useful tool to improve learning, results must be delivered in a timely fashion that allows teachers and students to review and learn from their performance.

The technology and tools to deliver useful reports already exist but have not been made accessible to students. Standardized tests are constructed to meet particular standards and each question is designed to measure particular skills with particular levels of complexity. This data is all recorded before a test is administered, so why would test agencies not share with a student that the questions they mostly got wrong were "geometry > triangles > algebraic expressions" rather than simply "geometry"? The former is actionable in a way the latter is not.

Separate Speededness from Ability Assessment

Psychometricians have defined a speeded test as one in which "completing the questions in the allotted time presents a significant challenge." For many students, the time constraints are the most significant factor in their performance. Under highly-speeded conditions, questions of whether the tests are measuring actual knowledge or simply speed of navigating the test abound.

"Is it American conceit that speed and profundity are the same thing — that someone who is facile and quick is necessarily better?" — Leon Botstein, President, Bard College

Modernize the Assessments

The vast majority of standardized tests are essentially the same as tests first used in the early 1900s. Somehow the science of assessment has been stubbornly resistant to improvements in practice despite changes in educational theory or technology. A more effective assessment would heed research showing that many tests use poorly designed "distractor" answer choices that make tests longer and less useful. Creating a modern test focused on demonstrable 21st-century skills rather than memorization of easily searchable factoids would benefit not only the testing agency but also American education.

Remove Multiple Choice from Math

Multiple choice answers not only allow random guessing to benefit a test-taker's score but also allow strategies like plugging in and backsolving to let someone get a right answer without knowing the math concept being tested. An additional benefit of removing or minimizing multiple choice questions is addressing gender differences, since research shows that boys perform better on multiple-choice tests and girls on open-ended questions.

Actively Combat Score Misuse

Standardized test scores are regularly used for purposes that go well beyond the test design and score-use intent. Test scores have been used to determine teacher bonuses, by Amazon to choose the location of HQ2, by Google to hire, by banks to evaluate credit-worthiness, by families to choose school districts, by schools to admit toddlers to kindergarten, and by the world to compare teaching quality. Just as test makers have lawyers and lobbyists who actively work for broader use of their products, they could and should employ advocates to curb the worst abuses of test scores.

"When we talk about the number of students who are likely to be college successful, you've gotta take those numbers with a grain of salt. Policy makers or folks who hope to sell tests will make unfortunate statements, will not listen to the folks in research who try to contextualize the scores." — Wayne Camera, Vice President of Research, ACT

Improve the Conversation About Tests

The continued description of tests as measures of "preparedness" or "readiness" or "ability" presents them as something that determines who will be successful rather than as a tool to assist in determining who has a particular set of skills at a particular moment in time. Broader, deeper, richer analysis with quicker, more specific feedback from large-scale assessments would allow them to be actual tools of opportunity rather than tools of sorting and simplistic categorization.

"Antiquated standardized tests are still serving as the backbone for measuring achievement. Our students can't escape Kelly's century-old invention." — Cathy N. Davidson, The Washington Post