Among my least favorite things that come with the fall season are pumpkin spice everything, the college ranking freak out, and the data misinterpretation that stems from the release of the yearly state SAT scores. Each year after the College Board releases data on the average scores you get a new round of newspaper articles and 6 o’clock news reports on whether that state’s (or district’s) scores have fallen or risen. But what is consistently left out in the rush to report is any attempt at providing relevance and meaning to the numbers. Let’s check out some of this year’s journalistic gems:
2014 SAT Scores ‘Flat And Stagnant’ Ahead Of Major Test Redesign – Huffington Post
SAT scores for the class of 2014 averaged a 497 in reading, 513 in math and 487 in writing — about the same as the last few years, according to a report released Tuesday by the College Board, the administrator of the notorious college entrance exam.
While the average reading score, 497, went up by a point from last year, math dipped by a point to 513 and writing is the lowest ever at 487. The class of 2006 was the first to take the writing section and scored an all-time high of 497.
Bulloch County posts higher SAT scores – Statesboro Herald
Bulloch County’s 2014 graduates overall scored higher in all three components of the test, critical reading, math and writing, compared to scores posted by members of the Class of 2013. The school district’s mean scores in these areas resulted in a total score average of 1419, a 2-point increase over 2013 scores, but still significantly below the state and national averages of 1445 and 1497.
State’s SAT Scores Stagnant; Minority Participation Up – Hartford Courant
If considered separately, Connecticut’s public high school seniors last year scored 499 on reading — the same as the 2013 class. The average math score fell by 3 points to 500, and the writing score also was down, falling by 4 points to 500.
Average SAT scores show little change – Inside Higher Ed
The average score in critical reading increased one point, while average scores in math and writing fell by one point. Scores have been either flat or slowly declining for the past several years, dropping 11 points in reading and seven points in math in the past decade.
SAT scores unchanged and many students unprepared for college – The Plain Dealer
Scores in critical reading, math and writing have been either flat or slowly declining for the past several years, dropping 11 points in reading and seven points in math in the past decade
Georgia SAT scores drop slightly in 2014 – WXIA-TV
Math and critical reading scores dropped by two points this year; the mean scores were 485 and 488, respectively. Writing scores dropped by three points, to 472.
SAT scores for Class of 2014 show no improvement from previous - Washington Post
In the District, 4,832 students in the Class of 2014 took the SAT, up 21 percent. Their average score on the test was 1309 out of 2400, a drop of 91 points from 2013. Changes in participation rates often influence the exam’s scores. In Virginia, SAT participation fell 1 percent, while the average score rose two points to 1530. In Maryland, SAT participation rose 3 percent, while the average score fell 15 points to 1468. The SAT is more widely used in those two states and the District than the ACT.
SAT scores up in SC, down in some Midlands districts – MyrtleBeachOnline.com
The Kershaw County school district’s average score improved 29 points to a 1,423 – the biggest improvement of any Midlands district. Richland 2’s average score improved four points to 1,407.
What’s notably absent from not only the headlines (which is forgivable) but also from the articles (which is unforgivable) is any attempt at quantifying, explaining, or qualifying what the score changes mean (with the notable exception of the MyrtleBeachOnline.com article, kudos to you Jamie Self). Most of these articles simply report on score changes as if they are meaningful with no further context necessary. And what really gets my goat (as the old folks say) is that these articles all fail to answer three key questions that should be answered whenever any statistics are provided:
- Are these figures statistically significant, meaning are they the result of typical shifts or random chance?
- Are these statistics meaningful, meaning do the statistics provide meaningful representation of student performance, academic ability or opportunity for college admission?
- Are there factors that account for the statistics that need to be considered, are there demographic changes, differences in the test itself, or differences in test prep?
It seems that these journalist have bought into the hype of big data and have regurgitated the data provided by the College Board but put no thought into what that data might mean or what factors explain or provide meaning. Before I get off on a more theoretical rant and since the internet loves list, let me give you the top 5 reasons to ignore all the brouhaha about these fairly minimal SAT score changes:
- The cited changes are typically the result of getting 1 more question right or wrong
Most states scores have varied by less than 5 points since last year and by about 10 points in the last 30 years. In terms of the SAT, if a test-taker gets a question correct instead of skipping it, that would earn 1 raw score point, which is typically 10 points on the 200 to 800 scale. So a score change of 2 or 3 points has probably more to do with luck than any representation of learning over time.
- These score changes are well within the Standard Errors
Testing experts have a ton of technical terms for the simple idea that testing is imprecise. If you take a test today and the same test tomorrow you’ll probably have a score that’s similar but not the same. The two terms that I most often look at are Standard Error of Measurement (SEM) and Standard Error of Difference (SED). According to the College Board, SED is the number by which two scores must be different to be meaningful and SED on each section of the SAT is more than 60 points. The SEM indicates that if a test-takers tested again there is about a 70% chance that the second score will be within a range of 32 points above or below the first score. If the same logic applies in any way (though the larger the group the less impact SEM and SED will have) who cares about a 2, 3 or 10 point change? Based on the standard errors there is almost no reason to consider a change of 10 or fewer points.
- Statistics 101 tells us that these numbers are often not significant or meaningful
Just because average scores are different doesn’t mean that difference is statistically significant. Many of these articles are about small groups and thus the variance may be simply the result of the size of the groups rather than real statistical significance. And even when they are significant are the changes meaningful? In the words of the inimitable Erik the Red Tutor (my go to SAT stats guy, who also happens to teach HS and runs an awesome free SAT informational resource): “Significant means that two identically skilled 30,000-person groups taking the SAT would rarely have averages that differ. But significant isn’t the same as meaningful, and I don’t think changes of even 10 points mean much. About half of the major SAT score declines in the 60s and 70s (when there was a drop in average verbal scores of fifty-odd points and 30-something points in math) were due to demographic changes in test takers, not to too much TV, or a decline in school quality, to name common alternative explanations. So in that context it seems overblown to be concerned about 10 point changes now. “
- No one reported on other factors that would impacted scores such as demographic changes or test preparation.
In researching score changes in various districts none of these reports thought to inquire about whether a school system instituted test prep programs which might have impacted score changes. Few if any of the articles mentioned whether or not the district provided SAT School Day (giving the real SAT during the school day – which DC did), which would suddenly change the profile of test-takers by including a larger number of test-takers who traditionally perform poorly (low income, African American, and Hispanics to name a few groups).
- The nature of the standardized test is to be resistant to score changes
The SAT by its definition, construction, and the practices of its writers is resistant to score changes. The SAT is a standardized test and its founding philosophy seeks to create a bell curve on which results are distributed such that 64% of students will be within 1 standard deviation of the mean score. The SAT in fact beta tests all questions and ensures that there are the proper number of questions that the vast majority of students will get wrong (difficult questions) on the test. If by chance (or improvements in instruction) the scores start to shift too far from the mean the makers of the SAT simply “recenter the exam” so that the score distribution remains as the College Board intended.
With all the statistical shenanigans and test changes (2 major rewrites and many minor changes) that have taken place in the test itself, it’s no surprise that since 1972 scores have fluctuated little year to year. As with most statistics they only become relevant in large scale (why are SAT scores down over the last 20 years is an important question, not why are they down in the last year). All in all these articles simply induce panic, hand-wringing, or crowing with no context, explanation, or real understanding of what’s going on behind the scenes.
To quote the inimitable Chuck D