You are here
Principals are widely seen as a key influence on the educational environment of schools, and nearly all principals have experience as teachers. Yet there is no evidence on whether we can predict the effectiveness of principals (as measured by their value added) based on their value added as teachers, an issue we explore using administrative data from Washington. Several descriptive features of the principal labor market stand out. First, teachers who become principals tend to have higher levels of educational attainment while teaching and are less likely to be female, but we find no significant differences in licensure test scores between those teachers who become principals and those we do not observe in the principalship. Second, principal labor markets appear to be quite localized: about 50 percent of principals previously taught in the same district in which they assumed a principalship. We find positive correlations between teacher and principal value added in reading (ELA) and similarly sized but less precise estimates in math. Teachers who become principals have slightly higher teacher value added, but the difference between the two groups is not statistically significant, suggesting that principals are not systematically selected based on their prior effectiveness when serving as a classroom teacher.
Citation: Dan Goldhaber, Kristian Holden, Bingjie Chen (2019). Do More Effective Teachers Become More Effective Principals?. CALDER Working Paper No. 215-0119-1
We use longitudinal data from North Carolina and Washington to study the extent to which four processes—teacher attrition from each state workforce, teacher mobility within districts, teacher mobility across districts, and teacher hiring—contribute to “teacher quality gaps” (TQGs) between advantaged and disadvantaged schools. We first replicate prior findings documenting inequities in each of these processes using different measures of student disadvantage (race and poverty) and teacher quality (experience, licensure test scores, and value added) and then develop and implement a simulation to assess the extent to which each process contributes to observed TQGs in each state. We find that all four processes contribute to TQGs but also document considerable heterogeneity in the extent to which each process contributes to the different TQG measures. For example, patterns in teacher attrition and mobility contribute more to TQGs measured by teacher experience, while patterns in teacher hiring explain the majority of TQGs measured by teacher licensure test scores and value added.
Citation: Dan Goldhaber, Vanessa Quince , Roddy Theobald (2018). How Did It Get This Way? Disentangling the Sources of Teacher Quality Gaps Across Two States. CALDER Working Paper No. 209-1118-1
We study the relative performance of two policy relevant value-added models – a one-step fixed effect model and a two-step aggregated residuals model – using a simulated dataset well grounded in the value-added literature. A key feature of our data generating process is that student achievement depends on a continuous measure of economic disadvantage. This is a realistic condition that has implications for model performance because researchers typically have access to only a noisy, binary measure of disadvantage. We find that one- and two-step value-added models perform similarly across a wide range of student and teacher sorting conditions, with the two-step model modestly outperforming the one-step model in conditions that best match observed sorting in real data. A reason for the generally superior performance of the two-step model is that it better handles the use of an error-prone, dichotomous proxy for student disadvantage.
WP 179 was revised in September 2018. It was originally released in June 2017
Citation: Eric Parsons, Cory Koedel, Li Tan (2018). Accounting for Student Disadvantage in Value-Added Models (Update). CALDER Working Paper No. 179
Few studies examine employee responses to layoff-induced unemployment risk; none that we know of quantify the impact of job insecurity on individual employee productivity. Using data from the Los Angeles Unified School District and Washington State during the Great Recession, we provide the first evidence about the impact of the layoff process on teacher productivity. In both sites we find that teachers impacted by the layoff process are less productive than those who do not face layoff-induced job threat. LAUSD teachers who are laid off and then rehired to return to the district are less productive in the two years following the layoff. Washington teachers who are given a reduction-in-force (RIF) notice and are then not laid off have reduced effectiveness in the year of the RIF. We argue that these results are likely driven by impacts of the layoff process on teachers’ job commitment and present evidence to rule out alternate explanations.
WP 140 was revised in March 2018. It was originally released in November 2015.
Citation: Katharine O. Strunk, Dan Goldhaber, David S. Knight, Nate Brown (2018). Are There Hidden Costs Associated With Conducting Layoffs? The Impact of RIFs and Layoffs on Teacher Effectiveness. CALDER Working Paper No. 140
UTeach is a well-known, university-based program designed to increase the number of high-quality STEM teachers in the workforce. Despite substantial investment and rapid program diffusion, there is little evidence about the effectiveness of UTeach graduates. Using administrative data from the state of Texas, we measure the impact of having a UTeach teacher on student test scores in math and science in middle schools and high schools. We find that students taught by UTeach teachers perform significantly better on end-of-grade tests in math and end-of-course tests in math and science by 8% to 14% of a standard deviation on the test, depending on grade and subject.
WP 173 was revised in February 2018. It was originally released in December 2016.
Citation: Benjamin Backes, Dan Goldhaber, Whitney Cade, Kate Sullivan , Melissa Dodson (2018). Can UTeach? Assessing the Relative Effectiveness of STEM Teachers. CALDER Working Paper No. 173
Improving public sector workforce quality is challenging in sectors such as education where worker productivity is difficult to assess and manager incentives are muted by political and bureaucratic constraints. In this paper, we study how providing improved information to principals about teacher effectiveness and encouraging them to use the information in personnel decisions affects the composition of teacher turnovers. Our setting is the Houston Independent School District, which recently implemented a rigorous teacher evaluation system. Prior to the new system, teacher effectiveness was negatively correlated with district exit and we show that the policy significantly strengthened this relationship, primarily by increasing the relative likelihood of exit for teachers in the bottom quintile of the quality distribution. Low-performing teachers working in low-achieving schools were especially likely to leave. However, despite the success, the implied change to the quality of the workforce overall is too small to have a detectable impact on student achievement.
Citation: Julie Berry Cullen, Cory Koedel, Eric Parsons (2017). The Compositional Effect of Rigorous Teacher Evaluation on Workforce Quality. CALDER Working Paper No. 168
This policy brief reviews evidence about the extent to which disadvantaged students are taught by teachers with lower value-added estimates of performance, and seeks to reconcile differences in findings from different studies. We demonstrate that much of the inequity in teacher value added in Washington state is due to differences across different districts, so studies that only investigate inequities within districts likely understate the overall inequity in the distribution of teacher effectiveness because they miss one of the primary sources of this inequity.
Citation: Dan Goldhaber, Vanessa Quince , Roddy Theobald (2016). Reconciling Different Estimates of Teacher Quality Gaps Based on Value Added. CALDER Working Paper No.
There is mounting evidence of substantial “teacher quality gaps” (TQGs) between advantaged and disadvantaged students, but practically no empirical evidence about their history. We use longitudinal data on public school students, teachers, and schools from two states—North Carolina and Washington—to provide a descriptive history of the evolution of TQGs in these states. We find that TQGs exist in every year in each state and for all measures we consider of student disadvantage and teacher quality. But there is variation in the magnitudes and sources of TQGs over time, between the two states, and depending on the measure of student disadvantage and teacher quality.
Citation: Dan Goldhaber, Vanessa Quince , Roddy Theobald (2016). Has It Always Been This Way? Tracing the Evolution of Teacher Quality Gaps in U.S. Public Schools. CALDER Working Paper No. 171
We use longitudinal data from Washington State to provide estimates of the extent to which performance on the edTPA, a performance-based, subject-specific assessment of teacher candidates, is predictive of the likelihood of employment in the teacher workforce and value-added measures of teacher effectiveness. While edTPA scores are highly predictive of employment in the state’s public teaching workforce, evidence on the relationship between edTPA scores and teaching effectiveness is more mixed. Specifically, continuous edTPA scores are a significant predictor of student mathematics achievement in some specifications, but when we consider that the edTPA is a binary screen of teaching effectiveness (i.e., pass/fail), we find that passing the edTPA is significantly predictive of teacher effectiveness in reading but not in mathematics. We also find that Hispanic candidates in Washington were more than three times more likely to fail the edTPA after it became consequential in the state than non-Hispanic White candidates.
Citation: Dan Goldhaber, James Cowan, Roddy Theobald (2016). Evaluating Prospective Teachers: Testing the Predictive Validity of the edTPA (Update). CALDER Working Paper No. 157
It is widely believed that teacher turnover adversely affects the quality of instruction in urban schools serving predominantly disadvantaged children, and a growing body of research investigates various components of turnover effects. The evidence at first seems contradictory, as the quality of instruction appears to decline following turnover despite the fact that most work shows higher attrition for less effective teachers. This raises concerns that confounding factors bias estimates of transition differences in teacher effectiveness, the adverse effects of turnover or both. After taking more extensive steps to account for nonrandom sorting of students into classrooms and endogenous teacher exits and grade-switching, we replicate existing findings of adverse selection out of schools and negative effects of turnover in lower-achievement schools. But we find that these turnover effects can be fully accounted for by the resulting loss in experience and productivity loss following the reallocation of some incumbent teachers to different grades.
Citation: Eric Hanushek, Steven Rivkin, Jeffrey Schiman (2016). Dynamic Effects of Teacher Turnover on the Quality of Instruction. CALDER Working Paper No. 170
Using administrative longitudinal data from five states, we study how value-added measures of teacher performance are affected by changes in state standards and assessments. We first document the stability of teachers’ value-added rankings during transitions to new standard and assessment regimes and compare our findings to stability during stable standard and assessment regimes. We also examine the predictive validity of value-added estimates during nontransition years over transition-year student achievement. In most cases we find that measures of teacher value added are similarly stable in transition years and nontransition years. Moreover, there is no evidence that the level of disadvantage of students taught disproportionately influences teacher rankings in transition years relative to stable years. In the states we study, student achievement in math can consistently be forecasted accurately—although not perfectly—using value-added estimates for teachers during stable standards and assessment regimes. There was somewhat less consistency in reading, because we find cases where test transitions significantly reduced forecasting accuracy.
Citation: Benjamin Backes, James Cowan, Dan Goldhaber, Cory Koedel, Luke Miller, Zeyu Xu (2016). The Common Core Conundrum: To What Extent Should We Worry That Changes to Assessments and Standards Will Affect Test-Based Measures of Teacher Performance?. CALDER Working Paper No. 152
We use data from six Washington State teacher education programs to investigate the relationship between teacher candidates’ student teaching experiences and their later teaching effectiveness and probability of attrition. We find that teachers who student taught in schools with lower teacher turnover are less likely to leave the state’s teaching workforce, and that teachers are more effective when the student demographics of their current school are similar to the student demographics of the school in which they did their student teaching. While descriptive, these findings suggest that the school context in which student teaching occurs has important implications for the later outcomes of teachers and their students.
Citation: Dan Goldhaber, John M. Krieg, Roddy Theobald (2016). Does the Match Matter? Exploring Whether Student Teaching Experiences Affect Teacher Effectiveness and Attrition. CALDER Working Paper No. 149
We use rich longitudinally matched administrative data on students and teachers in North Carolina to examine the patterns of differential effectiveness by teachers’ years of experience. The paper contributes to the literature by focusing on middle school teachers and by extending the analysis to student outcomes beyond test scores. Once we control statistically for the quality of individual teachers by the use of teacher fixed effects, we find large returns to experience for middle school teachers in the form both of higher test scores and improvements in student behavior, with the clearest behavioral effects emerging for reductions in student absenteeism. Moreover these returns extend well beyond the first few years of teaching. The paper contributes to policy debates by documenting that teachers can and do continue to learn on the job.
December 2015 Update
Citation: Helen Ladd, Lucy Sorensen (2015). Returns to Teacher Experience: Student Achievement and Motivation in Middle School. CALDER Working Paper No. 112
Recent evidence on teacher productivity suggests teachers meaningfully influence noncognitive student outcomes that are commonly overlooked by narrowly focusing on student test scores. These effects may show similar levels of variation across the teacher workforce and are not significantly correlated with value-added test score gains. Despite a large number of studies investigating the TFA effect on math and English achievement, little is known about nontested outcomes. Using administrative data from Miami-Dade County Public Schools, we investigate the relationship between being in a TFA classroom and non-test student outcomes. We validate our use of nontest student outcomes to assess differences in teacher productivity using the quasi-experimental teacher switching methods of Chetty, Friedman, and Rockoff (2014) and find multiple cases in which these tests reject the validity of candidate nontest outcomes. Among the cases deemed valid, we find suggestive evidence that students taught by TFA teachers in elementary and middle school were less likely to miss school due to unexcused absences and suspensions (compared to non-TFA teachers in the same school), although point estimates are very small. Other nontest outcomes were found to be valid but showed no evidence of a TFA effect.
Citation: Benjamin Backes, Michael Hansen (2015). Teach For America Impact Estimates on Nontested Student Outcomes. CALDER Working Paper No. 146
Teach For America (TFA) is an alternative certification program that intensively recruits and selects recent college graduates and midcareer professionals to teach in schools serving high-need students. Prior rigorous evaluations of the program have generally found positive effects of TFA teachers on students’ learning in math and science and no significant differences in reading or language arts, compared with non-TFA teachers’ effects in the same schools. No priorstudies, however, have specifically focused on TFA effects in the Atlanta region.
This report examines the efficacy of TFA teachers in the Atlanta region spanning the 2005-06 through 2013-14 school years. Using longitudinal administrative data from three major school districts with significant numbers of recent TFA placements, we generate TFA effect estimates based on two series of Georgia’s standardized tests—the end-of-grade Criterion-Referenced Competency Tests (CRCTs) and end-of-course tests (EOCTs).
We find evidence of a positive effect in student learning due to the hiring of TFA teachers in these three districts, compared with the performance of non-TFA colleagues in the same schools. Estimated TFA effects are positive and statistically significant in social studies and science on the state’s CRCTs, and in American literature on the state’s EOCTs. We find no significant differences in performance between TFA and non-TFA teachers in the other subjects we analyzed. Supplementary analyses show these results are not sensitive to the inclusion of data from a period of well-documented test score manipulation in Atlanta Public Schools.
Citation: Michael Hansen, Tim Sass (2015). Performance Estimates of Teach For America Teachers in Atlanta Metropolitan Area School Districts. CALDER Working Paper No. 145
A sizeable body of evidence has documented the effectiveness of Teach For America (TFA) corps members at raising the mathematics test scores of their students, though little is known about the program’s impact at the school level. TFA’s recent placement strategy in the Miami-Dade County Public Schools, in which large numbers of TFA corps members are placed as clusters into a targeted set of disadvantaged schools, provides an opportunity to evaluate the impact of the TFA program on broader school performance. This study examines whether the influx of TFA corps members led to a spillover effect on other teachers’ performance. We find that many of the schools chosen to participate in the cluster strategy experienced large subsequent gains in mathematics achievement. These gains were driven in part by the composition effect of having larger numbers of effective TFA corps members. However, we do not find any evidence that the clustering strategy led to any spillover effect on schoolwide performance. In other words, our estimates suggest that the extra student gains for TFA corps members under the clustering strategy would be equivalent to gains resulting from an alternate placement strategy in which corps members were evenly distributed across schools.
Revised August 31, 2015
Citation: Michael Hansen, Benjamin Backes, Victoria Brady, Zeyu Xu (2015). Examining Spillover Effects from Teach For America Corps Members in Miami- Dade County Public Schools. CALDER Working Paper No. 113
There is increased policy interest in extending test-based evaluations in K-12 education to include student achievement in high school. High school achievement is typically measured by performance on end-of-course exams (EOCs), which test course-specific standards in a variety of subjects. However, unlike standardized tests in the early grades, students take EOCs at different points in their schooling careers. The timing of the test is a choice variable presumably determined by input from administrators, students and parents. Recent research indicates that school and district policies that determine when students take particular courses can have important consequences for achievement and subsequent outcomes like advanced course taking. We develop an approach for modeling EOC test performance that disentangles the influence of school and district policies regarding the timing of course taking from other factors. After separating out the timing issue, better measures of the quality of instruction provided by districts, schools and teachers can be obtained. Our approach also offers diagnostic value because it separates out the influence of school and district course-timing policies from other factors that determine student achievement.
Citation: Eric Parsons, Cory Koedel, Michael Podgursky, Mark Ehlert , P. Brett Xiang (2015). Incorporating End-of-Course Exam Timing into Educational Performance Evaluations. CALDER Working Paper No. 137
This study uses detailed administrative data on teachers and students from the state of North Carolina to revisit the empirical evidence on master’s degrees, with attention to teachers at the middle and high school levels. It provides descriptive information on which types of teachers obtain master’s degrees, for which subjects, at which institutions, and during what phase of their career. The study estimates returns to master’s degrees using teacher fixed effects to control for time-invariant characteristics of teachers, thus separating the effects of teacher decisions to get an advanced degree from the effects of having one. Even with this careful attention to selection bias, we confirm the findings of prior studies showing that teachers with master’s degrees are no more effective than those without. The only consistently positive effect of attaining a master’s degree emerging from this study relates not to student test scores but rather to lower student absentee rates in middle school.
Citation: Helen Ladd, Lucy C. Sorensen (2015). Do Master’s Degrees Matter? Advanced Degrees, Career Paths, and the Effectiveness of Teachers. CALDER Working Paper No. 136
Evidence suggests that teacher hiring in public schools is ad hoc and often fails to result in good selection among applicants. Some districts use structured selection instruments in the hiring process, but we know little about the efficacy of such tools. In this paper, we evaluate the ability of applicant selection tools used by the Spokane Public Schools to predict three outcomes: measures of teachers’ value-added contributions to student learning, teacher absence behavior, and attrition rates. We observe all applicants to the district and are therefore able to estimate sample selection-corrected models, using random tally errors in selection instruments and differences in the quality of competition across job postings. These two factors influence the probability of being hired by Spokane Public Schools but are unrelated to measures of teacher performance. We find that the screening instruments predict teacher value added in student achievement and teacher attrition but not teacher absences. A onestandard- deviation increase in screening scores is associated with an increase of between 0.03 and 0.07 standard deviations in student achievement and a decrease in teacher attrition of 2.5 percentage points.
Citation: Dan Goldhaber, Cyrus Grout, Nick Huntington-Klein (2014). Screen Twice, Cut Once: Assessing the Predictive Validity of Teacher Selection Tools. CALDER Working Paper No. 120
Teacher and principal evaluation systems now emerging in response to federal, state and/or local policy initiatives typically require that a component of teacher evaluation be based on multiple performance metrics, which must be combined to produce summative ratings of teacher effectiveness. Districts have utilized three common approaches to combine these multiple performance measures, all of which introduce bias and/or additional prediction error that was not present in the performance measures originally. This paper investigates whether the bias and error introduced by these approaches erodes the ability of evaluation systems to reliably identify high- and low-performing teachers. The analysis compares the expected differences in long-term teacher value-added among teachers identified as high- or low-performing under these three approaches, using simulated data based on estimated inter-correlations and reliability of measures in the Gates Foundation’s Measures of Effective Teaching project. Based on the results of our simulation exercise presented here, we conclude these approaches can undermine the evaluation system’s objectives in some contexts. Depending on the way these performance measures are actually combined to categorize teacher performance, the additional error and bias can be large enough to undermine the district’s objectives.
Citation: Michael Hansen, Mariann Lemke, Nicholas Sorensen (2014). Combining Multiple Performance Measures: Do Common Approaches Undermine Districts’ Personnel Evaluation Systems?. CALDER Working Paper No. 118