Home The Washington Diplomat October 2011 Schools Size Up Teachers Using Value-Added Evaluation Measures

Schools Size Up Teachers Using Value-Added Evaluation Measures

Schools Size Up Teachers Using Value-Added Evaluation Measures

The heated debate over how to improve America’s education system has in recent years centered largely on teachers — how they perform and what to do if they’re not up to the task. But to judge that performance requires evaluating it — a concept that’s often thrown around in the debate but given little penetrating thought. It’s one thing to simply say bad teachers should be fired, but what’s bad? How exactly do you define the metrics of success? How much weight should factors ranging from career experience to test scores to student surroundings be given?

Those gritty details of evaluation criteria have a tremendous impact, determining how to implement reforms aimed at boosting the quality of the country’s teaching workforce — not to mention determining who gets to hang onto their jobs and which schools receive precious funds. And even though the contentious debate over “good” versus “bad” teachers is far from over, educators have made significant headway on an issue that’s critical to moving that debate forward.

“U.S. public schools are in the early stages of a revolution in how they go about evaluating teachers,” according to a report by the Brookings Institution published in April. “It really is an 180-degree turn,” said Russ Whitehurst, a key author of the report and director of Brookings Brown Center on Education Policy.

Yet it also has a revolution’s hallmarks: confusion, controversy, internal battles and an uncertain future. Proposed new evaluation methods throughout the country have been attacked by teachers, parents, students and education experts alike. It even brought Hollywood star Matt Damon to a “Save Our Schools” rally near the White House this summer where he condemned the evaluation changes to cheers of approval.

So why all the revulsion to the evaluation revolution?

Traditional teacher evaluations in the United States combine a “pass-fail” rating by a school principal with points for an educators’ degrees and years of experience. Studies show that almost all teachers evaluated this way get high scores and assured employment, leading U.S. Secretary of Education Arne Duncan to point out that “in our country, 99 percent of our teachers are above average.”

But empirical studies and common sense show that all educators are not equal — as any parent trying to get their child into a “good” teacher’s classroom can tell you.

So states and school districts are increasingly trying more evidence-based, rigorous teacher assessments. They typically rely on mathematical formulas and incorporate student scores on standardized tests — the so-called “high-stakes testing” scores ushered in a decade ago by the No Child Left Behind Act (also see “Are the Rigors of Testing Producing Generation of Students Under Strain?” in the August 2011 issue of The Washington Diplomat).

“A new generation of teacher evaluation systems seeks to make performance measurement and feedback more rigorous and useful,” said the Brookings Institution report, titled “Passing Muster: Evaluating Teacher Evaluation Systems.”

Photo: Jon Schulte / iStock
D.C., Maryland and Virginia have all approved new guidelines for evaluating teachers using “value-added” assessments that take into account factors such as test scores, student family income levels and academic progress.

“These systems incorporate multiple sources of information, including such metrics as systematic classroom observations, student and parent surveys, measures of professionalism and commitment to the school community, more differentiated principal ratings, and test score gains for students in each teacher’s classrooms. The latter indicator, test score gains, typically incorporates a variety of statistical controls for differences among teachers in the circumstances in which they teach. Such a measure is called teacher value-added because it estimates the value that individual teachers add to the academic growth of their students.”

The resulting “value-added” assessments have been both widely criticized and widely misunderstood. Most such assessments subtract the test score of the student at the beginning of the school year from the student’s score at the end and make statistical adjustments to account for factors outside a teacher’s control, such as the income level of the student’s family.

Locally, the Washington area has been an epicenter of the new evaluation push. This July, the largest teacher union in the country, the National Education Association, reversed policy and approved, with restrictions, teacher evaluations based on student progress, including the qualified use of test scores.

Last April, Virginia’s Board of Education approved new teacher evaluation guidelines for its school districts that included “academic progress” yardsticks and the use of standardized test scores. The model recommended that 40 percent of a teacher’s score be based on student academic progress.

Then in June, Maryland approved a similar system that tied 50 percent of a teacher’s evaluation to student progress. The state will now test drive the system in seven school districts.

In 2009, D.C. Public Schools adopted a so-called “IMPACT” evaluation system that uses value-added assessments where it has the data to do so — which currently only applies to about 20 percent of teachers in the District.

Pressure to change these evaluations had come largely from the federal government as concerns have grown over America’s declining rankings in international student testing, and the difficulty many states are having meeting student proficiency standards set by the No Child Left Behind Act. The Department of Education is using a carrot-stick approach with funding to get schools to adopt more rigorous assessments. For example, the Obama administration’s Race to the Top Fund, a competitive grant program designed to encourage and reward states that show educational improvement, requires value-added teacher ratings.

The billions of dollars at play is a “golden leash,” Whitehurst of the Brookings Institution told The Diplomat. Some states are reluctantly applying for Race to the Top funds, while others are using the money as an excuse to do what they’ve wanted to do for a while but didn’t for fear of local backlash.

That backlash can be powerful. Criticism of Michelle Rhee’s tenure as D.C. schools chief and the IMPACT system she introduced triggered a fierce nationwide debate over firing teachers that affected the outcome of the last D.C. mayoral race and contributed to Rhee’s departure. However, her replacement, current D.C. Schools Chancellor Kaya Henderson, has pledged to keep IMPACT but improve it — and school officials around the country have been keeping an eye on D.C. to see how the system affects its teachers.

One of those teachers, Ellie (she did not want her real name to be used), who’s been evaluated under D.C.’s IMPACT system since it was introduced in 2009, doesn’t mind the scrutiny.

Passionate about teaching and “getting kids excited” about learning, she said “all school staff members should be accountable for student progress. Some teachers don’t put effort into their jobs,” and unlike the old ways of evaluating those teachers, Ellie said the newer models such as IMPACT can weed them out. “And good teachers need to be acknowledged and rewarded.”

But while Ellie, 28, supports rigorous teacher assessment and praises parts of IMPACT — its classroom observation system, its carefully laid-out standards for good teaching — she is critical of its drawbacks even though she’s personally gained from it, earning high teacher ratings and a salary bonus that she received by giving up tenure-track teaching for the chance to make more money.

IMPACT is hard on both students and teachers, she admits, because 50 percent of a teacher’s evaluation score comes from the District’s standardized testing system. One round of testing can make or break a student or a teacher, and that needs to be changed, she told The Diplomat.

“You have third-grade kids taking this huge test, four mornings for a week,” she said. “They get exhausted. What if they’re having a bad day? Some give up in discouragement. And I don’t think one multiple-choice test is indicative of what a child can do.”

She added that teachers are told to “differentiate” among students, to adapt to varied learning styles and strengths, but “then the system turns around and uses a one-size-fits-all testing protocol” to determine the teachers’ fate.

Concerns such as Ellie’s are largely valid, says Whitehurst, formerly an influential player at the Department of Education who encouraged scientific rigor across the board.

“We don’t know yet how to do these new evaluation systems well and we need to be aware of the amount of error in them. And even the best systems can’t capture most of what a good teacher can do.”

Whitehurst recommends using a variety of measures to rate teacher effectiveness. However, he adds that few measures currently approach the validity of standardized tests.

He also cautions against discounting the new evaluation methods right off the bat. “It would be hard to imagine a system that would do a worse job than what we had before. We need to fix the problems rather than throw the whole effort out.”

Value-added systems are often misunderstood, Whitehurst also argues. First, they are always set up to “wring out” things that teachers can’t control, such as the number of free or reduced-cost lunches consumed in a student cafeteria. Every value-added evaluation model, including IMPACT, adjusts for things such as student poverty, neighborhood blight, attendance, and students for whom English is a second language, all of which can benefit educators dedicated to at-risk kids in poor environments.

Ellie was one of them. She first taught at a school in one of the District’s most challenging neighborhoods in Ward 8’s Anacostia. The school’s proficiency scores in English and math started out “in the single digits,” she recalled, but an innovative principal who hired enthusiastic educators turned things around and the team raised proficiency scores to “around 17 percent.”

Ellie said she wanted to stay at that school but did not because of a “last-hired” scenario, whereby the last people hired are the first to be fired in budget cutbacks. She now teaches in a Northwest D.C. elementary school where most of the students are middle class. In both situations she’s been rated “highly effective” and is proud of that. But some things in the evaluation system make her and her colleagues uneasy, she carefully said.

Actor Matt Damon, whose mother is a teacher, was much more than uneasy about the new rules at the “Save Our Schools” rally. He was vehement in his opposition. “None of the qualities that I prize or that made me a success — love of learning, curiosity, imagination — can be tested,” he told the crowd.

That’s another common misunderstanding, according to Whitehurst, who says concerns such as Damon’s are “overblown.” There’s “not a lot of evidence that a testing focus narrows the curriculum. Increases in reading and math time are typically stripped from recess.”

Under value-added systems, Whitehurst admits that effective teachers can sometimes get low ratings and poor ones may slide by. But he says that studies show these systems are still our most reliable yardsticks and can predict — fairly accurately — how a particular teacher is likely to perform over time.

However, Whitehurst believes it’s also important to continue to develop as many reliable, valid measures of teacher performance as we can and reduce dependence on standardized tests.

Ellie agrees, and adds that there are things that can be done right away: More emphasis on student portfolios would be a place to start, along with a reduction in the high percentages that evaluation systems give over to the standardized tests. “I don’t do well on standardized tests myself,” she said. “My SAT scores weren’t great but I had a 4.0 [grade point average] in graduate school.”

Whitehurst likes this kind of feedback, he says, because it’s important to have teachers and parents at the table when evaluation systems are being set up. Administrators “can benefit from humbleness and willingness to learn.” Additionally, the “rush to get a system in place” can create so many inequities and problems that they could fuel further opposition.

One local teacher evaluation system officially admired by Education Secretary Duncan has already developed an effective approach that doesn’t emphasize test scores: Montgomery County, the largest school system in Maryland. Highly diverse, with about 30 percent of its residents foreign-born, the county is also one of the best educated and wealthiest in the United States.

Montgomery County Public Schools provides all teachers with an extensive system of training, mentoring, tracking and required coursework. New teachers are offered intensive guidance and help, and any instructor who gets a below-standard evaluation is assisted through an elaborate Peer Assistance and Review program over several years.

It’s not clear whether a less affluent school district could offer such an extensive web of resources for its teachers, but Ellie would like to see more support for all struggling but potentially good teachers. Meanwhile, she sees value-added assessment systems such as IMPACT as far from perfect, but also cheerfully calls them “a step in the right direction.”

About the Author

Carolyn Cosmos is a contributing writer for The Washington Diplomat.