Unpacking the CAASPP ELA Performance Task Scores
A lot of energy, time, and money is being spent on state assessments, so it is proper to check in and ask how it is going. It would not make sense for us to continue to spend the effort on testing only to ignore the results. And yet, eight years into the California CAASPP system, little is known about individual-level student scores responses to the Summative tests, especially with the multiple-choice section of the test. The overall scale score each student receives does not actually break down the types of questions each student answered correctly.
Over the last eight years this fact has remained – individual student level data on the Computer Adapted test is still unknown. In the last two years, however, the CDE has finally started releasing student level PERFORMANCE TASK data. The is a sea change when it comes to the ability to "unpack" and understand how students are doing on at least PART of the CAASPP Summative test. Let me explain.
California students in grades 3-8 and 11 have been taking the CAASPP English Language Arts test for about eight years now. Yes, there are math and science tests too, but let's just focus on the ELA test for now. The test itself is made up of two separate components, the first is a computer adapted test where students answer multiple choice and matching questions based around grade-level standards. This first part of the two-part test is also referred to as the computer adapted test or "CAT" component. The second part of the test is a much different "extended writing" performance task or "PT." The scores from both the CAT and the PT are then combined into a final "Scale Score" that the student receives. This scale score determines if the student has met, or not yet met, the grade-level standards set by the state.
An essential aspect of understanding the student's scale scores would be to understand how many multiple-choice questions on the CAT each student got right and how many they got wrong. You would also want to know the specific areas (e.g., standards). This would help us to understand if a particular standard was not being taught well in a particular class or school. This is what is referred to as "Item-level analysis." Maybe Ms. Rivera is doing a spectacular job teaching about the semicolon, while Mr. Smith's students continuously bomb that question or standard. Or perhaps one entire school has its act together when it comes to writing Narrative genres and another school is struggling with that specific topic. Both would be good data points to understand. And both would be actionable. Yet despite this obvious need no ITEM level data has ever been made available for the CAT portion of the CAASPP summative tests.
This is concerning. We've literally administered about 24 million of these state-run summative tests and yet none of my assessment colleagues have ever actually seen a breakdown, by standard or item, of how our students performed. We get some general and mostly ambiguous information around claims and targets, but it never has the level of granularity needed to dig into the true reasons for student underperformance.
We know, for example, that roughly 47% of the students have "met or exceeded" the cut-off for proficiency in CAASPP ELA last year. But any concerned educator would want to know the answers to the following follow-up questions: In what areas were my students not proficient? Are the weak areas across all my schools, and so perhaps attributable to an inadequate or outdated textbook? Are English Learners or IEP students stuck in particular standards or items that could be addressed through targeted instruction?
These are good questions that routinely go unanswered unless a district is also willing to administer a second set of interim or benchmark assessments to drill down into the same questions. But why is the CAASPP ELA CAT item-level data not available on the summative state tests? That is, after all, the measure we are being held accountable to on the dashboard or for charter renewal.
So, if we can't get good data out of the CAT, can we still get something of value out of the OTHER component of the CAASPP ELA test? What about the Performance Task? Is that data available and 'actionable?' In sharp contrast to the conversation around, CAT, the answer to those questions is a resounding "YES!" Starting in September of 2021 the state has made available the data from each student's Writing Extended Response or "WER" score. This is significant as it shows us, for the very first time in CAASPP history, how well our students performed on the writing portion of the test.
The WER data is now available to district Test Coordinators through the TOMS (Test Operations Management System) system and the "CAASPP Student Score Data File." Though the file formats change from year to year, you can pull out several key data points from each year. The first is the genre. Students are randomly assigned one of two or three different genres depending upon their grade level. Next comes the individual "Purpose and Organization" score (from 1 to 4), then the "Evidence and Elaboration" score (also from 1 to 4) and finally the "Conventions" score (from 0 to 2). When you add up the three values, you get a range of between 2 and 10. This is the WER score for the CAASPP ELA. We now colloquially refer to this as the "4-4-2" score to emphasize the three ways in which students are evaluated.
Some students also receive a "no score." Students receiving a no score either left the entire PT test blank, or the response was woefully insufficient, off topic/purpose, or written in a language other than English. When we crunched the numbers, we were SHOCKED to learn that a whopping 15% or more of students in some of our schools received a "no score." How can this be, we wondered? We literally watched them type a response!
When further analyzing our data, I expected to see a normal bell curve between 2 and 10, with the average hovering around a "6" overall. What we found at some schools was quite different. Some of our schools struggled greatly with the performance task component. More than a third of our students scored 2's or 3's on the 10-point rubric. Basically, they were at the lowest possible scores on the CAASPP ELA performance task. These results were shocking. We immediately struggled with why our students were performing so poorly on the writing portion.
A review of the actual performance tasks revealed some possible reasons for our student's less than stellar results on the Performance Task. First, the actual CAASPP performance task is lengthy, complex, and repetitive. The instructions are complex and repeated several times. Unnecessary references are sprinkled throughout. You must get to the second to the last page to read the actual prompt, which is the question or task being posed to the student. A careful reworking of the actual test length could reduce fatigue and confusion and strengthen the purpose of the test, which is to measure a student's ability to RESPOND to a prompt, not a student's ability to FIND the prompt!
Second, the students are being asked to type responses without the benefit of a printed copy to read or take notes on. So, scrolling, and the ability to scroll in small increments to read relevant information, and keep that information handy, becomes a real deal breaker for some kids. Students are in for a challenge if they are new to Chromebooks, or new to typing for that matter, or who might struggle using multiple windows on a tiny, dimly lit screen. Students should have multiple exposures to the actual CAASPP testing interface and learn to use the note-taking and highlighting tools before being asked to take the real deal.
A final observation was that the Narrative genre appears to be the most difficult for students. Students are usually successful at telling THEIR OWN stories. Perhaps the challenge here is they aren't adequately referencing the source materials? We would know for sure if the state would release the actual text responses. But if they did that they would open the gates of hell as I could see some parents demanding their student's rubric score be moved from a 9 to a 10 if they actually got to review their son or daughter's actual typed response. Let's keep that can of worms closed for the time being and instead spend some quality time practicing writing narratives from selected prompts.
You can now view the actual smarter balanced grade level examples, by the way, at this website: smart.smarterbalanced.org. Not only does it display the complete performance task (e.g., prompt and sources), but it also shows examples of student work at across all levels of performance, from students who received a "no score" to students earning a perfect "10." Just for fun, print out and read a THIRD GRADE perfect 10 and ask yourself at what age/grade were you when you could write (let alone type) at that level of proficiency. A quick survey of my staff revealed that most of us weren't writing at that level until at least the 7th or 8th grade. Maybe these tests really are too difficult?
As we've explored the data and begun to understand WHY our students performed so poorly on the CAASPP ELA Performance Task we have been reaching out to colleagues in the Los Angeles County area. We found that most (68%) did not even know about the availability of the WER scores. A quick google search found no webinars on the subject offered by the CDE (California Department of Education) in the last two years. So again, we've been sitting on a gold mine of data that no one is yet really laying claim to. Here's my appeal to my California assessment colleagues: It's time to start digging.