High-Stakes For Whom? Understanding Principal Behavior in Rating Teacher Performance
While the last five years have brought significant changes* to the design of K-12 schools鈥 teacher evaluation systems, we have not witnessed a corresponding in differentiation among teacher performance ratings. Evaluation systems still typically rely most heavily on observations of classroom practice鈥攚hich are usually conducted by school principals鈥攁nd these observation ratings tend to be . Why is it that principals consistently assign high ratings to most of their teachers? Is it because principals really perceive most of their teachers to be high performers, or are principals not reflecting their true perceptions in these ratings?
Last month, Education Week published an with a title that appeared to confidently answer these questions: Want Principals to Rate Teachers Honestly? Take Away the Stakes. While the headline certainly grabs the reader鈥檚 attention, critics of stakes in teacher evaluation systems should look a bit more closely at the research before becoming too gleeful: the reality is that the research the article highlights doesn鈥檛 quite support that assertion.
The Education Week article covers a recent by academic researchers Jason Grissom and Susanna Loeb that compares 100 principals鈥 summative teacher evaluation ratings in a 鈥渉igh-stakes鈥 environment to their evaluations of those teachers in a 鈥渓ow-stakes鈥 environment. The researchers simulated a low-stakes environment by conducting confidential one-on-one interviews with principals where they were asked to rate some of their teachers. The study concluded that principals assign mostly positive ratings in both situations, but there is far more variation in the low-stakes ratings than in the high-stakes ones (although both were predictive of teachers鈥 value-added ratings).
These findings should be interpreted with the caveat that there are some limitations to the study. The instruments that were used in the low-stakes and high-stakes evaluations were different. The high stakes evaluation instrument asked principals to rate teachers on seven high-level standards (such as knowledge of learners, communication, and instructional planning), while the low-stakes evaluation instrument asked principals to rate teachers on 10 specific items (such as high test performance, improving critical thinking, and helping with school leadership). Also, the rating scales principals used for the two assessments were different; the low-stakes assessment used a more nuanced rating scale than the high-stakes one (a six-point scale vs. a four-point scale, respectively), which may have led principals to in turn be more willing to provide a below average rating.
In addition to the differences in instruments used, there are limitations to the real-world applicability of the 鈥渓ow-stakes鈥 scenario, which the research paper鈥檚 authors readily admit. The simulated low-stakes scenario created a confidential interview rating process, which does not reflect the standard evaluation process where teachers are informed of their rating and provided associated feedback, regardless of the existence of 鈥渟takes.鈥
There are a variety of reasons why teacher evaluations would likely still result in overly positive ratings from principals even without high stakes for teachers鈥攁nd many of them are actually related to the stakes for principals in issuing low ratings. Grissom that principals conducting teacher evaluations 鈥渁re capable of differentiating, but they also face really strong incentives to not fully differentiate when they know there are potential job consequences for their teachers or consequences for their own relationships with their teachers (emphasis added).鈥 Grissom and Loeb are not the first to explore how concerns about creating tense relationships or damaging school culture can lead principals to inflate their assigned ratings of teachers. by Kraft & Gilmour identified 鈥減ersonal discomfort鈥 as well as several other explanations for why principals often rate teachers highly, even if their true perceptions do not match that rating, including:
-
Time constraints: Principals have to observe a teacher and collect evidence to back up a low rating. Principals then have to provide feedback and create improvement plans to support low-performing teachers. This kind of increased workload may cause principals to use low-performing ratings sparingly.
-
Teachers鈥 potential and motivation: Rather than risking losing of a teacher they see as having potential, principals may assign teachers鈥攑articularly newer ones鈥攁 slightly higher rating in order to keep them motivated and receptive to feedback.
-
The challenge of removing and replacing teachers: Principals may not assign a low-rating in order to avoid the time and financial burden associated with what is often a lengthy dismissal process. They may also want to avoid dismissing a teacher for fear of being to fill that vacancy with a lower-quality replacement from an excess pool.
In order to address some of these challenges, policymakers should consider changes to principals鈥 assigned responsibilities, as well as to their professional development and evaluation processes. States and districts could reassess whether principals should be solely responsible for observing and providing feedback to teachers or if those, or other, responsibilities could be distributed to others with sufficient expertise. Doing so鈥攁long with encouragement and resources to provide more frequent informal observations, feedback, and opportunities for meaningful coaching by principals or other school staff鈥攃ould create cultures that enable more honest, trusting performance conversations. States and districts could invest in more training for principals around how to have difficult conversations with staff about developing their practice, as well as providing principals with clear guidance on how to collect evidence and artifacts to support performance ratings. Education system leaders can also ensure that principal supervisors have the skills and capacity to evaluate principals鈥 execution of teacher evaluation and support systems in these key areas.
Most of policymakers鈥 attention has been placed on whether, and to what degree, test scores should be used in teacher evaluations. But observation of teacher practice typically makes up half or more of a teacher evaluation, making the observer鈥檚 role in evaluation even more critical. While high stakes associated with evaluation ratings may not encourage principals to submit honest ratings of their teachers, eliminating stakes is not going to fix the lack of differentiation in teacher evaluation. If we want teacher evaluation systems to better differentiate performance and promote professional growth, school systems should lower the stakes for principals by providing them sufficient time and support in聽 teacher practice, providing honest feedback, and following up with meaningful ways to address any areas identified for improvement.
* As a quick recap, over the last five years, many states began requiring new state- or district-developed evaluation systems that strive to be more rigorous and more objective than historical systems. They do this by factoring in multiple measures (which at a minimum include observations of classroom practice and a measure of impact on student learning growth, such as聽d models,聽and incorporating at least three performance rating categories. Some systems, such as , also require or encourage 鈥渉igh-stake鈥 consequences, such as dismissal for the lowest performers or bonuses for the highest performers.