You are here
Novice teachers’ professional contexts may have important implications for their effectiveness, development, and retention. However, descriptions of these contexts suffer from data limitations, resulting in unidimensional or vague characterizations. Using 10 years of administrative data from the Los Angeles Unified School District, we describe patterns of new teacher sorting using 27 context measures organized along three distinct dimensions - intensity of instructional responsibilities, homophily, and colleague qualifications – and use school-level survey data to measure a fourth dimension (professional culture). Relative to more experienced teachers, novice teachers have placements that are more challenging along the first three dimensions, and composite measures are differentially predictive of teachers’ outcomes. This suggests that policymakers should consider placements to better retain and develop novice teachers.
Citation: Paul Bruno, Sarah Rabovsky, Katharine Strunk (2019). Taking their First Steps: The Distribution of New Teachers into School and Classroom Contexts and Implications for Teacher Effectiveness and Growth. CALDER Working Paper No. 212-0119-1
We use longitudinal data from North Carolina and Washington to study the extent to which four processes—teacher attrition from each state workforce, teacher mobility within districts, teacher mobility across districts, and teacher hiring—contribute to “teacher quality gaps” (TQGs) between advantaged and disadvantaged schools. We first replicate prior findings documenting inequities in each of these processes using different measures of student disadvantage (race and poverty) and teacher quality (experience, licensure test scores, and value added) and then develop and implement a simulation to assess the extent to which each process contributes to observed TQGs in each state. We find that all four processes contribute to TQGs but also document considerable heterogeneity in the extent to which each process contributes to the different TQG measures. For example, patterns in teacher attrition and mobility contribute more to TQGs measured by teacher experience, while patterns in teacher hiring explain the majority of TQGs measured by teacher licensure test scores and value added.
Citation: Dan Goldhaber, Vanessa Quince , Roddy Theobald (2018). How Did It Get This Way? Disentangling the Sources of Teacher Quality Gaps Across Two States. CALDER Working Paper No. 209-1118-1
We study the relative performance of two policy relevant value-added models – a one-step fixed effect model and a two-step aggregated residuals model – using a simulated dataset well grounded in the value-added literature. A key feature of our data generating process is that student achievement depends on a continuous measure of economic disadvantage. This is a realistic condition that has implications for model performance because researchers typically have access to only a noisy, binary measure of disadvantage. We find that one- and two-step value-added models perform similarly across a wide range of student and teacher sorting conditions, with the two-step model modestly outperforming the one-step model in conditions that best match observed sorting in real data. A reason for the generally superior performance of the two-step model is that it better handles the use of an error-prone, dichotomous proxy for student disadvantage.
WP 179 was revised in September 2018. It was originally released in June 2017
Citation: Eric Parsons, Cory Koedel, Li Tan (2018). Accounting for Student Disadvantage in Value-Added Models (Update). CALDER Working Paper No. 179
We use statewide data from Massachusetts to investigate teacher performance evaluations as a measure of teaching effectiveness. Consistent with prior research, we find that assignment to lower achieving classrooms reduces teachers’ performance ratings. But after adjusting for these and other observable differences between classroom assignments, we show that regression-adjusted performance measures can reliably predict future evaluation ratings as teachers move across grades and subjects within the same school. However, we also document substantial unexplained variation in ratings across schools and districts in the state. In particular, districts vary substantially both in the extent to which they differentiate between teachers and in the sensitivity of performance ratings to differences in teacher effectiveness as measured by value added. As a result, even after regression adjustment, teacher evaluation ratings generally provide unreliable predictions of future teacher evaluations after teachers switch schools. These findings suggest that policymakers and researchers should use caution in using performance evaluation ratings to make comparisons between teachers in different contexts.
WP 197-0618-1 was originally released in June 2018. An updated version (WP 197-0618-2) was released in August 2020.
Citation: James Cowan , Dan Goldhaber, Roddy Theobald (2018). Performance Evaluations as a Measure of Teacher Effectiveness when Standards Differ: Accounting for Variation across Classrooms, Schools, and Districts. CALDER Working Paper No. 197-0618-2
Few studies examine employee responses to layoff-induced unemployment risk; none that we know of quantify the impact of job insecurity on individual employee productivity. Using data from the Los Angeles Unified School District and Washington State during the Great Recession, we provide the first evidence about the impact of the layoff process on teacher productivity. In both sites we find that teachers impacted by the layoff process are less productive than those who do not face layoff-induced job threat. LAUSD teachers who are laid off and then rehired to return to the district are less productive in the two years following the layoff. Washington teachers who are given a reduction-in-force (RIF) notice and are then not laid off have reduced effectiveness in the year of the RIF. We argue that these results are likely driven by impacts of the layoff process on teachers’ job commitment and present evidence to rule out alternate explanations.
WP 140 was revised in March 2018. It was originally released in November 2015.
Citation: Katharine O. Strunk, Dan Goldhaber, David S. Knight, Nate Brown (2018). Are There Hidden Costs Associated With Conducting Layoffs? The Impact of RIFs and Layoffs on Teacher Effectiveness. CALDER Working Paper No. 140
We study how the introduction of a rigorous teacher evaluation system in a large urban school district affects the quality composition of teacher turnovers. With the implementation of the new system, we document increased turnover among the least effective teachers and decreased turnover among the most effective teachers, relative to teachers in the middle of the distribution. Our findings demonstrate that the alignment between personnel decisions and teacher effectiveness can be improved through targeted personnel policies. However, the change in the composition of exiters brought on by the policy we study is too small to meaningfully impact student achievement.
Working Paper 168-0717 was updated March 2019, it was orginally published July 2017.
Citation: Julie Berry Cullen, Cory Koedel, Eric Parsons (2017). The Compositional Effect of Rigorous Teacher Evaluation on Workforce Quality. CALDER Working Paper No. 168
We investigate the relationship between teacher licensure test scores and student test achievement and high school course-taking. We focus on three subject/grade combinations—middle school math, ninth-grade algebra and geometry, and ninth-grade biology—and find evidence that a teacher’s basic skills test scores are modestly predictive of student achievement in middle and high school math and highly predictive of student achievement in high school biology. A teacher’s subject-specific licensure test scores are a consistent and statistically significant predictor of student achievement only in high school biology. Finally, we find little evidence that students assigned to middle school teachers with higher basic-skills test scores are more likely to take advanced math and science courses in high school.
Citation: Dan Goldhaber, Trevor Gratz, Roddy Theobald (2016). What’s in a Teacher Test? Assessing the Relationship between Teacher Licensure Test Scores and Student Secondary STEM Achievement. CALDER Working Paper No. 158
This policy brief reviews evidence about the extent to which disadvantaged students are taught by teachers with lower value-added estimates of performance, and seeks to reconcile differences in findings from different studies. We demonstrate that much of the inequity in teacher value added in Washington state is due to differences across different districts, so studies that only investigate inequities within districts likely understate the overall inequity in the distribution of teacher effectiveness because they miss one of the primary sources of this inequity.
Citation: Dan Goldhaber, Vanessa Quince , Roddy Theobald (2016). Reconciling Different Estimates of Teacher Quality Gaps Based on Value Added. CALDER Working Paper No.
There is mounting evidence of substantial “teacher quality gaps” (TQGs) between advantaged and disadvantaged students, but practically no empirical evidence about their history. We use longitudinal data on public school students, teachers, and schools from two states—North Carolina and Washington—to provide a descriptive history of the evolution of TQGs in these states. We find that TQGs exist in every year in each state and for all measures we consider of student disadvantage and teacher quality. But there is variation in the magnitudes and sources of TQGs over time, between the two states, and depending on the measure of student disadvantage and teacher quality.
Citation: Dan Goldhaber, Vanessa Quince , Roddy Theobald (2016). Has It Always Been This Way? Tracing the Evolution of Teacher Quality Gaps in U.S. Public Schools. CALDER Working Paper No. 171
We use longitudinal data from Washington State to provide estimates of the extent to which performance on the edTPA, a performance-based, subject-specific assessment of teacher candidates, is predictive of the likelihood of employment in the teacher workforce and value-added measures of teacher effectiveness. While edTPA scores are highly predictive of employment in the state’s public teaching workforce, evidence on the relationship between edTPA scores and teaching effectiveness is more mixed. Specifically, continuous edTPA scores are a significant predictor of student mathematics achievement in some specifications, but when we consider that the edTPA is a binary screen of teaching effectiveness (i.e., pass/fail), we find that passing the edTPA is significantly predictive of teacher effectiveness in reading but not in mathematics. We also find that Hispanic candidates in Washington were more than three times more likely to fail the edTPA after it became consequential in the state than non-Hispanic White candidates.
Citation: Dan Goldhaber, James Cowan, Roddy Theobald (2016). Evaluating Prospective Teachers: Testing the Predictive Validity of the edTPA (Update). CALDER Working Paper No. 157
In practice, teacher turnover appears to have negative effects on school quality as measured by student performance. However, some simulations suggest that turnover can instead have large, positive effects under a policy regime in which low-performing teachers can be accurately identified and replaced with more effective teachers. This study examines this question by evaluating the effects of teacher turnover on student achievement under IMPACT, the unique performance-assessment and incentive system in the District of Columbia Public Schools (DCPS). Employing a quasi-experimental design based on data from the first year years of IMPACT, we find that, on average, DCPS replaced teachers who left with teachers who increased student achievement by 0.08 SD in math. When we isolate the effects of lower-performing teachers who were induced to leave DCPS for poor performance, we find that student achievement improves by larger and statistically significant amounts (i.e., 0.14 SD in reading and 0.21 SD in math). In contrast, the effect of exits by teachers not sanctioned under IMPACT is typically negative but not statistically significant.
Citation: Melinda Adnot, Thomas Dee, Veronica Katz, James Wyckoff (2016). Teacher Turnover, Teacher Quality and Student Achievement in DCPS. CALDER Working Paper No. 153
Using administrative longitudinal data from five states, we study how value-added measures of teacher performance are affected by changes in state standards and assessments. We first document the stability of teachers’ value-added rankings during transitions to new standard and assessment regimes and compare our findings to stability during stable standard and assessment regimes. We also examine the predictive validity of value-added estimates during nontransition years over transition-year student achievement. In most cases we find that measures of teacher value added are similarly stable in transition years and nontransition years. Moreover, there is no evidence that the level of disadvantage of students taught disproportionately influences teacher rankings in transition years relative to stable years. In the states we study, student achievement in math can consistently be forecasted accurately—although not perfectly—using value-added estimates for teachers during stable standards and assessment regimes. There was somewhat less consistency in reading, because we find cases where test transitions significantly reduced forecasting accuracy.
Citation: Benjamin Backes, James Cowan, Dan Goldhaber, Cory Koedel, Luke Miller, Zeyu Xu (2016). The Common Core Conundrum: To What Extent Should We Worry That Changes to Assessments and Standards Will Affect Test-Based Measures of Teacher Performance?. CALDER Working Paper No. 152
We use data from six Washington State teacher education programs to investigate the relationship between teacher candidates’ student teaching experiences and their later teaching effectiveness and probability of attrition. We find that teachers who student taught in schools with lower teacher turnover are less likely to leave the state’s teaching workforce, and that teachers are more effective when the student demographics of their current school are similar to the student demographics of the school in which they did their student teaching. While descriptive, these findings suggest that the school context in which student teaching occurs has important implications for the later outcomes of teachers and their students.
Citation: Dan Goldhaber, John M. Krieg, Roddy Theobald (2016). Does the Match Matter? Exploring Whether Student Teaching Experiences Affect Teacher Effectiveness and Attrition. CALDER Working Paper No. 149
We use rich longitudinally matched administrative data on students and teachers in North Carolina to examine the patterns of differential effectiveness by teachers’ years of experience. The paper contributes to the literature by focusing on middle school teachers and by extending the analysis to student outcomes beyond test scores. Once we control statistically for the quality of individual teachers by the use of teacher fixed effects, we find large returns to experience for middle school teachers in the form both of higher test scores and improvements in student behavior, with the clearest behavioral effects emerging for reductions in student absenteeism. Moreover these returns extend well beyond the first few years of teaching. The paper contributes to policy debates by documenting that teachers can and do continue to learn on the job.
December 2015 Update
Citation: Helen Ladd, Lucy Sorensen (2015). Returns to Teacher Experience: Student Achievement and Motivation in Middle School. CALDER Working Paper No. 112
There is increased policy interest in extending test-based evaluations in K-12 education to include student achievement in high school. High school achievement is typically measured by performance on end-of-course exams (EOCs), which test course-specific standards in a variety of subjects. However, unlike standardized tests in the early grades, students take EOCs at different points in their schooling careers. The timing of the test is a choice variable presumably determined by input from administrators, students and parents. Recent research indicates that school and district policies that determine when students take particular courses can have important consequences for achievement and subsequent outcomes like advanced course taking. We develop an approach for modeling EOC test performance that disentangles the influence of school and district policies regarding the timing of course taking from other factors. After separating out the timing issue, better measures of the quality of instruction provided by districts, schools and teachers can be obtained. Our approach also offers diagnostic value because it separates out the influence of school and district course-timing policies from other factors that determine student achievement.
Citation: Eric Parsons, Cory Koedel, Michael Podgursky, Mark Ehlert , P. Brett Xiang (2015). Incorporating End-of-Course Exam Timing into Educational Performance Evaluations. CALDER Working Paper No. 137
Teacher and principal evaluation systems now emerging in response to federal, state and/or local policy initiatives typically require that a component of teacher evaluation be based on multiple performance metrics, which must be combined to produce summative ratings of teacher effectiveness. Districts have utilized three common approaches to combine these multiple performance measures, all of which introduce bias and/or additional prediction error that was not present in the performance measures originally. This paper investigates whether the bias and error introduced by these approaches erodes the ability of evaluation systems to reliably identify high- and low-performing teachers. The analysis compares the expected differences in long-term teacher value-added among teachers identified as high- or low-performing under these three approaches, using simulated data based on estimated inter-correlations and reliability of measures in the Gates Foundation’s Measures of Effective Teaching project. Based on the results of our simulation exercise presented here, we conclude these approaches can undermine the evaluation system’s objectives in some contexts. Depending on the way these performance measures are actually combined to categorize teacher performance, the additional error and bias can be large enough to undermine the district’s objectives.
Citation: Michael Hansen, Mariann Lemke, Nicholas Sorensen (2014). Combining Multiple Performance Measures: Do Common Approaches Undermine Districts’ Personnel Evaluation Systems?. CALDER Working Paper No. 118
Teacher pension systems target retirements within a narrow range of the career cycle by penalizing individuals who separate too soon or remain employed too long. The penalties result in the retention of some teachers who would otherwise choose to leave, and the premature exit of some teachers who would otherwise choose to stay. We examine how the effects of teachers' pension incentives on workforce composition influence teacher quality. Teachers who are held in by the "pull" incentives in the pension systems are not more effective, on average, than the typical teacher. Teachers who are encouraged to exit by the "push" incentives are more effective on average. We conclude that the net effect of teachers' pension incentives on workforce quality is small, but negative. Given the substantial and growing costs of current systems, and the lack of evidence regarding their efficacy, experimentation by traditional and charter schools with alternative retirement benefit structures would be useful.
Citation: Cory Koedel, Michael Podgursky (2014). Teacher Pension Systems, the Composition of the Teaching Workforce, and Teacher Quality. CALDER Working Paper No. 72
This paper examines the value of strategically assigning disproportionately larger classes to the strongest teachers in order to optimize student learning in the face of differential teacher effectiveness. The rationale is straightforward: Larger classes for the best teachers benefit the pupils who are reassigned to them; they also help the less effective teachers improve their instruction by enabling them to concentrate on fewer students. But just how much of a difference could manipulating class sizes in this way make for overall student learning and access to effective teaching? This study performs a simulation based on North Carolina data to estimate plausible student outcomes under this approach. In the North Carolina data, I find there is a very slight tendency to place more students in the classes of effective teachers; but still only about 25 percent of students are taught by the top 25 percent of teachers. Intensively reallocating eighth-grade students—so that the most effective teachers have up to twelve more pupils than the average classroom—may produce gains equivalent to adding roughly two-and-a-half extra weeks of school. Even adding a handful of students to the most effective eighth-grade teachers (up to six more than the school’s average) produces gains in math and science akin to extending the school year by nearly two weeks or, equivalently, to removing the lowest 5 percent of teachers from the classroom. The potential impacts on learning are more modest in fifth grade, where the large majority of teachers are in self-contained classrooms. Results show that this strategy shows an overall improvement in student access to effective teaching, yet gaps in access for economically disadvantaged students persist. For instance, disadvantaged eighth-grade students are about 8 percent less likely than non-disadvantaged peers to be assigned to a teacher in the top 25 percent of performance. This gap in access changes little in spite of the policy putting more students in front of effective teachers — because the pool of available teachers in high-poverty schools does not change under this strategy. Thus, this policy alone shows little promise in reducing achievement gaps.
Citation: Michael Hansen (2014). Right-Sizing the Classroom: Making the Most of Great Teachers. CALDER Working Paper No. 110
We examine the efficiency implications of imposing proportionality in teacher evaluation systems. Proportional evaluations force comparisons to be between equally-circumstanced teachers. We contrast proportional evaluations with global evaluations, which compare teachers to each other regardless of teaching circumstance. We consider a policy where administrators use the ratings from the evaluation system to help shape the teaching workforce, and define efficiency in terms of student achievement. Our analysis indicates that proportionality can be imposed in teacher evaluation systems without efficiency costs under a wide range of evaluation and estimation conditions. Proportionality is efficiency-enhancing in some cases. These findings are notable given that proportional teacher evaluations offer a number of other policy benefits.
Citation: Cory Koedel, Jiaxi Li (2014). The Efficiency Implications of Using Proportional Evaluations to Shape the Teaching Workforce. CALDER Working Paper No. 106
Measures of teachers’ “value added” to student achievement play an increasingly central role in k-12 teacher policy and practice, in part because they have been shown to predict teachers’ long-term impacts on students’ life outcomes. However, little research has examined variation in the long-term effects of teachers with similar value-added performance. In this study, we investigate variation in the persistence of teachers’ value-added effects on student achievement in New York City. We separate persistent effects into general effects that improve both the subject taught (math or English language arts (ELA)) and the other area of measured achievement and subject-specific effects which improve only the subject taught. Two findings emerge. First, a teacher’s value-added to ELA achievement has substantial crossover effects on long-term math performance. That is, having a better ELA teacher affects both math and ELA performance in a future year. Conversely, math teachers have only minimal long-term effects on ELA performance; their effects are far more subject-specific. Second, we identify substantial heterogeneity in the persistence of English Language Arts (ELA) teachers’ effects across observable student, teacher, and school characteristics. In particular, teachers in schools serving more poor, minority, and previously low-scoring students have less persistence than other teachers with the same value-added scores. Moreover, ELA teachers with stronger academic backgrounds have more persistent effects on student achievement, as do schools staffed with a higher proportion of such teachers. The results indicate that teachers’ effects on students’ long-term skills can vary as a function of instructional content and quality in ways that are not fully captured by value-added measures of teacher effectiveness.
Citation: Ben Master, Susanna Loeb, James Wyckoff (2014). Learning that Lasts: Unpacking Variation in Teachers’ Effects on Students’ Long-Term Knowledge. CALDER Working Paper No. 104