Taxpayers are entitled to know if public schools are doing their job, particularly today when so much is riding on the education young people receive. That’s why it’s worthwhile taking a closer look at the State of Texas Assessments of Academic Readiness.
Like all standardized tests, STAAR can be a useful tool to uncover how well teachers and schools are performing. But so much depends on how these instruments are designed and how the results are interpreted. It’s the latter that’s at the center of the present controversy.
Truth to tell, well written items are objects of great beauty because they’re both art and science. The wording can’t be ambiguous, and the knowledge and skills being assessed can’t favor one subgroup of students over another. Psychometricians call the latter “differential item functioning,” which is a fancy term for fairness.
Unfortunately, Texas is one of a handful of states that never bought into the Common Core, the national set of reading and math standards deemed indispensable for college and career. The Legislature voted to ban the standards in 2013 in the belief that they interfered with local control of education.
Once Texas chose to go down this road, extra care was needed in drawing valid conclusions, even after a panel of teachers had approved the items and then field-tested them on Texas students. That’s because creating high-quality tests is difficult and extremely labor intensive. They need to discriminate between strong and weak students — but not against groups of students. That’s a tight rope to walk, with racial diversity in the student population increasingly growing.
What has compounded the brouhaha in Texas is the inclusion of Lexile measures, which were developed by a company called MetaMetrics. They sometimes showed students reading at grade level, when STAAR showed they were not. Not surprisingly, parents were confused and alarmed. So far, the Texas Education Agency has defended the test’s items. Whether outside specialists will agree is unclear.
Yet the larger question is how STAAR results are ultimately used. If they’re used strictly for diagnostic purposes, as Finland does with its standardized tests, STAAR can help improve instruction. Almost all teachers welcome constructive criticism. But most often, they’re employed punitively: Persistently failing schools whose students don’t measure up are shuttered, the state takes them over, or teachers are summarily fired. It’s little wonder, therefore, that STAAR finds itself in the crosshairs.
If it’s any consolation, Texas isn’t alone. When the states of New York and Kentucky rolled out the first standardized tests aligned with the new Common Core standards introduced by the National Governors Association and Council of Chief State School Officers, most students failed. In 2013, less than a third of students in New York demonstrated “proficiency” in math and English. Kentucky didn’t do much better when it launched its assessments in 2012.
Setting cut scores can be arbitrary, which is why too much shouldn’t be read into STAAR, or for that matter, any standardized test. Some states have posted dramatic improvements by simply moving the goalposts. In the final analysis, therefore, great care needs to be taken before drawing sweeping conclusions about instruction.