I will protect your pensions. Nothing about your pension is going to change when I am governor. - Chris Christie, "An Open Letter to the Teachers of NJ" October, 2009

Sunday, May 21, 2017

Random Thoughts On Using VAM for Teacher Evaluation

You may have read the piece in the New York Times today by Kevin Carey on the passing of William Sanders, the father of idea of using value-added modeling (VAM) to evaluate teachers. Let me first offer my condolences to his family.

I'm going to skip a point-by-point critique of Carey's piece and, instead, offer a few random thoughts about the many problems with using VAMs in the classroom:

1) VAM models are highly complex and well beyond the understanding of almost all stakeholders, including teachers. Here's a typical VAM model:


Anyone who states with absolute certainty that VAM is a valid and reliable method of teacher evaluation, yet cannot tell you exactly what is happening in this model, is full of it.

There was a bit of a debate last year about whether it matters that student growth percentiles (SGPs) -- which are not the same as VAMs, but are close cousins -- are mathematically and conceptually complex. SGP proponents make the argument that understanding teacher evaluation models are like understanding pi: while the calculation may be complex, the underlying concept is simple. It is, therefore, fine to use SGPs/VAMs to evaluate teachers, even if they don't understand how they got their scores.

This argument strikes me as far too facile. Pi is a constant: it represents something (the circumference of a circle divided by its diameter) that is concrete and easy to understand. It isn’t expressed as a conditional distribution; it just is. It isn’t subject to variation depending on the method used to calculate it; it is always the same. An SGP or a VAM is, in contrast, an estimate, subject to error and varying degrees of bias depending on how it is calculated.

The plain fact is that most teachers, principals, school leaders, parents, and policy makers do not have the technical expertise to properly evaluate a probabilistic model like a VAM. And it is unethical, in my opinion, to impose a system of evaluation without properly training stakeholders in its construction and use.

2) VAM models are based on error-prone test scores, which introduces problems of reliability and validity. Standardized tests are subject to what the measurement community often calls "construct-irrelevant variance" -- which is just a fancy way of saying test scores vary for reasons other than knowing stuff. Plus there's the random error found in all test results, due to all kinds of things like testing conditions. 

This variance and noise causes all sorts of issues when put into a VAM. We know, for example, that the non-random sorting of students into teacher classrooms can create bias in the model. There is also a very complex issue known as attenuation bias when trying to deal with the error in test scores. There are ways to ameliorate it -- but there are tradeoffs. 

My point here is simply that these are very complicated issues and, again, well beyond the apprehension of most stakeholders. Which dictates caution in the use of VAM -- a caution that has been sorely lacking in actual policy.

3) VAM models are only as good as the data they use -- and the data's not so great. VAM models have to assign students to teachers. As an actual practitioner, I can tell you that's not as easy as it sounds. Who should be assigned a VAM score for language arts when a child is Limited English Proficient (LEP): the ELL teacher, or the classroom teacher? What about special education students who spend part of the school day "pulled out" of the homeroom? Teachers who team teach? Teachers who co-teach?

All this assumes we have data systems good enough to track kids, especially as they move from school to school and district to district. And if the models include covariates for student characteristics, we need to have good measures of students' socio-economic status, LEP status, or special education classification. Most of these measures, however, are quite crude.*

If we're going to make high-stakes decisions based on VAMs, we'd better be sure we have good data to do so. There's plenty of reason to believe the data we have isn't up to the job.

4) VAM models are relative; all students may be learning, but some teachers must be classified as "bad." Carey notes that VAMs produce "normal distributions" -- essentially, bell curves, where someone must be at the top, and someone must be at the bottom.


I've labeled this with student test scores, but you'd get the same thing with teacher VAM scores. Carey's piece might be read to imply that it was a revelation to Sanders that the scores came out this way. But VAMs yield normal distributions by design -- which means someone must be "bad."

General Electric's former CEO Jack Welch famously championed ranking his employees -- which is basically what a VAM does -- and then firing the bottom x percent. GE eventually moved away from the idea. I'm hardly a student of American business practices, but it always struck me that Welch's idea was hampered by a logical flaw: someone has to be ranked last, but that doesn't always mean he's "bad" at his job, or that his company is less efficient than it would be if he was fired.

I am certainly the last person to say our schools can't improve, nor would I ever say that we have the best possible teaching corps we could have. And I certainly believe there are teachers who should be counseled to improve; if they don't they should be made to leave the profession. There are undoubtedly teachers who should be fired immediately.

But the use of VAM may be driving good candidates away from the profession, even as it is very likely misidentifying "bad" teachers. Again, the use of VAM to evaluate systemic changes in schooling is, in my view, valid. But the argument for using VAM to make high-stakes individual decisions is quite weak. Which leads me to...

5) VAM models may be helpful for evaluating policy in the aggregate, but they are extremely problematic when used in policies that force high-stakes decisions. When the use of test-based teacher evaluation first came to New Jersey, Bruce Baker pointed out that its finer scale, compared to teacher observation scores, would lead to making SGPs/VAMs some of the evaluation but all of the decision.

But then NJDOE leadership -- because, to be frank, they had no idea what they were doing -- made teacher observation scores with phony precision. That led to high-stakes decisions compelled by the state based on arbitrary cut points and arbitrary weighting of the test-based component. The whole system is now an invalidated dumpster fire.

I am extremely reluctant to endorse any use of VAMs in teacher evaluation, because I think the corrupting pressures will be bad for students; in particular (and as a music teacher), I worry about narrowing the curriculum even further, although there are many other reasons for concern. Nonetheless, I am willing to concede there is a good-faith argument to be made for training school leaders in how to use VAMs to inform, rather than compel, their personnel decisions.

But that's not what's going on in the real world. These measures are being used to force high-stakes decisions, even though the scores are very noisy and prone to bias. I think that's ultimately very bad for the profession, which means it will be very bad for students.

Carey mentions the American Statistical Association's statement on using VAMs for educational assessment. Here, for me, is the money quote:
Research on VAMs has been fairly consistent that aspects of educational effectiveness that are measurable and within teacher control represent a small part of the total variation in student test scores or growth; most estimates in the literature attribute between 1% and 14% of the total variability to teachers. This is not saying that teachers have little effect on students, but that variation among teachers accounts for a small part of the variation in scores. The majority of the variation in test scores is attributable to factors outside of the teacher’s control such as student and family background, poverty, curriculum, and unmeasured influences. 
The VAM scores themselves have large standard errors, even when calculated using several years of data. These large standard errors make rankings unstable, even under the best scenarios for modeling. Combining VAMs across multiple years decreases the standard error of VAM scores. Multiple years of data, however, do not help problems caused when a model systematically undervalues teachers who work in specific contexts or with specific types of students, since that systematic undervaluation would be present in every year of data. 
A VAM score may provide teachers and administrators with information on their students’ performance and identify areas where improvement is needed, but it does not provide information on how to improve the teaching. The models, however, may be used to evaluate effects of policies or teacher training programs by comparing the average VAM scores of teachers from different programs. In these uses, the VAM scores partially adjust for the differing backgrounds of the students, and averaging the results over different teachers improves the stability of the estimates [emphasis mine]
Wise words. 

NJ's teacher evaluation system, aka "Operation Hindenburg."



* In districts where there is universal free-lunch enrollment,parents have no incentive to fill out paperwork designating their children as FL-eligible. So even that crude measure of student economic disadvantage is useless.

Saturday, May 20, 2017

U-Ark Screws Up A Charter School Revenue Study, AGAIN: Part II

Here's Part I of this series.


If this is true, it's really disturbing:
Colorado’s General Assembly on Wednesday passed a bill giving charter schools the same access to a local tax funding stream as district schools have, The Denver Post reported.
The bipartisan compromise measure, which supporters say is the first of its kind in the nation, would address an estimated $34 million inequity in local tax increases. It came a day after the University of Arkansas released a study that found charter schools receive $5,721 less per pupil on average than their district counterparts — a 29 percent funding gap. [emphasis mine]
It is, of course, standard operating procedure for outfits like the U-Ark Department of Education Reform to claim their work led to particular changes in policy; that's how they justify themselves to their reformy funders.  Maybe the connection between the report and the Colorado legislation (which is really awful -- more in a bit) is overblown...

But if the U-Ark report did sway the debate, that's a big problem. Because the report is just flat out wrong. 

As I explained in Part I, the claim that Camden, NJ, has a huge revenue gap between charters and public district schools seems to be based on an utterly phony comparison: all of the revenue, both charter and district, is linked to only the CCPS students -- not the charter students. Because the data source documentation in the report is so bad, I can't exactly replicate U-Ark's figures, so I invited Patrick Wolf and his colleagues to contact me and explain exactly how they got the figures they did.

So far, they remain silent.

But that isn't surprising. When U-Ark put out its first report in 2014, Bruce Baker tore it to shreds in a brief published by the National Education Policy Center. The latest U-Ark report cites Baker's brief, so they must have read it -- but they never bothered to answer Baker's main claim, which is that their comparisons are wholly invalid.

Further, what I documented in the last post is only one of the huge, glaring flaws in the report. Let me point out another, using Camden, NJ again as an example. We'll start by looking U-Ark's justification for using the methods they do:
This is a study of the revenues actually received by public charter schools and TPS. Revenues equal funding. Revenues signal the amount of resources that are being mobilized in support of students in the two different types of public schools. Some critics of these types of analyses argue that our revenue study should, instead, focus on school expenditures and excuse TPS from certain expenditure categories, such as transportation, because TPS are mandated to provide it but many charter schools choose not to spend scarce educational resources on that item. [emphasis mine]
"Choose" not to spend the revenues? Sorry to be blunt, but that statement is either deliberately deceptive or completely clueless.* In New Jersey, hosting public school districts are required to provide transportation for charter school students. The charters don't "choose" not to spend on transporting the kids; they avoid the expense because the district picks up the cost.

Baker pointed this out explicitly in his 2014 brief -- but U-Ark, once again, refuses to acknowledge the problem, even though we know for a fact they read Baker's report, because they cite it repeatedly.

And it gets worse.


For the sake of illustration, here's a simplified conceptual map of what Camden's public district school bus system might look like. We've got neighborhood schools divided into zones, and buses transporting children to their neighborhood school.** There are exceptions, of course, primarily for magnet and special needs students, but the system on the whole is fairly simple.

Now let's add some "choice":



There has been a marked decline in "active transportation" -- walking or biking -- to school over the past few decades, and school "choice" is almost certainly a major contributor. As we de-couple schools from neighborhoods (which may well have many other pernicious effects), transportation networks become more complex and more expensive.


As I said: New Jersey law requires public school districts like Camden to pay for transportation of charter school students. Which means all of these extra costs are borne solely by the district.


And how much does this cost Camden's charter sector?

So any comparison of revenues that doesn't exclude transportation -- and, again, it appears that U-Ark didn't exclude it, although their documentation is so bad we can't be sure -- is without merit. Claiming that charter schools have a revenue gap when they use services paid for by public district schools makes no sense.

Folks, this issue is so simple that it doesn't require an advanced understanding of school finance or New Jersey law to understand it. Which makes it all the more incredible the U-Ark team didn't account for it in their findings. And again: if the Colorado Assembly made their decision to raise the funding for charters -- at the expense of public district schools -- on the basis of a report that is this flawed...

Let's take a look at some better -- not perfect, but better -- financial comparisons between Camden's charters and CCPS next.



 * Granted, it might be both...

** It's worth noting that in a dense city like Camden, many of the students will be within walking distance of their neighborhood school. But when you introduce "choice," you make the school system much less walkable, because students are likely traveling greater distances. I was at a conference at Rutgers yesterday where researchers were looking into this issue -- more to come...

Tuesday, May 16, 2017

U-Ark Screws Up A Charter School Revenue Study, AGAIN: Part I

As someone who spends a good bit of his time debunking many of the claims of the education reformsters, one continuing frustration is how many of them don't seem to learn their lessons. Certainly, we can have good faith debates about education policy, and reasonable people can disagree on many things...

But when you've been called out in public for making a big mistake, and you don't at least attempt to correct yourself... well, it's hard to take you seriously -- even if other, less discriminating minds do.

We parents all have heard the claim that something wasn’t fair. “Suzie got a bigger piece of cake than I did!” “Tommy got to go fishing while I had to clean the garage!” “Malachi had a lot more money spent on his education because you sent him to a traditional public school and me to a public charter school!” Well, maybe we haven’t actually heard that last one very often but it would be a more legitimate gripe than the other ones. 
Students in public charter schools receive $5,721 or 29% less in average per-pupil revenue than students in traditional public schools (TPS) in 14 major metropolitan areas across the U. S in Fiscal Year 2014. That is the main conclusion of a study that my research team released yesterday.
This is from the crew at the University of Arkansas's "Department of Education Reform" -- yes, there is such a thing, I swear -- led by the author here, Patrick Wolf. The study Wolf's team produced purports to show that charters are getting screwed out of the revenues they deserve, which are instead flowing to public district schools.

(Side note: if charters "do more with less," why do they need the same money as public district schools? Isn't that part of their "awesomeness"?)

But here's the thing: the methods this study uses are similar to a study they produced back in 2014 -- a study that was thoroughly debunked a month later. In a brief published by the National Center for Education Policy, Dr. Bruce Baker* notes that even if we put aside many problems the U-Ark study has with documenting its data sources and explaining its methodologies, one enormous flaw renders the entire report useless:

As mentioned earlier, the major issue that critically undercuts all findings and conclusions of the study, and any subsequent “return on investment” comparisons, is the report’s misunderstanding of intergovernmental fiscal relationships. Again, as the authors note, they studied “all revenues” (not expenditures), because studying expenditures, while “fascinating” would be “extremely difficult” (Technical appendix, p. 385). 
Any “revenue per pupil” figure includes two parts that may significantly affect the figure. What goes into the total revenue measure? And how are pupils counted? If one’s goal is to compare “revenues per pupil” of one entity to another, one must be able to appropriately align the correct revenue measure with the correct pupil measure for each entity. That is, for the district, one must identify the revenues intended to provide services to the district’s pupils and revenues intended to provide services to the charter school’s pupils. If numbers are missed orworse yetwrongly attributed, the comparison becomes invalid and misleading. [emphasis mine]
Baker cites several examples of how U-Ark gets this basic idea wrong time and again -- including, in his first example, U-Ark's analysis of Newark, NJ:
One can get closer to the $28,000 figure by dividing total revenue for that year by the district enrollment, excluding sent pupils (charter school, out of district special education, etc.). But this would be particularly wrong and the result substantially inflated because the numerator would include all revenues for both district and sent charter students, but the denominator would include only district students
Again, Baker pointed this out in 2014. But guess what? By all appearances, U-Ark made the same mistake once again in 2017. Let me see if I can explain this with a few pictures.


Unlike the U-Ark report, I'm going to tell you exactly where I'm getting my data for all these slides. The school year is 2013-14, just like the U-Ark report. The fiscal data comes from the User-Friendly Budget Guide** published by the New Jersey Department of Education; U-Ark says in its data comes from the NJDOE, so the figures should be the same. I get my charter enrollment numbers from the Enrollment data of the NJDOE, using the 2013-14 files.

There were, according to these sources, 17,273 students in Camden's total enrollment for 2013-14. These include contracted pre-school and out-of-district placements, which we will set aside for now (even though that is a deeply flawed thing to do -- more later). If we take the total full-time enrollment -- 15,546 -- and subtract 4,251 charter students, we get 11,295 Camden City Public School students.



Total revenues for the district in that year were $369,770,349. This included $54,902,533 in transfers to the Camden charter schools. Understand that this was not necessarily the only source of revenue for the charters, who might also collect funds directly from the federal government or from private sources. It's also worth pointing out here that all 4,251 Camden charter students may not come from Camden (although it's safe to assume the vast majority are city residents). But, as we'll see, that doesn't matter anyway.



Using these figures, U-Ark steps in to make its per pupil calculations. In the numerator is the revenue  collected by the district or the charter schools; in the denominator are the students enrolled in each sector.

See the problem?


If we use all of the $370 million in the district's per pupil figure, but we only count the students in CCPS and not the charters, we wind up double-counting about $55 million. Because that money is in both the district per pupil figure and in the charter figure.

Even U-Ark admits they should not do this:


That $370 million figure -- a figure, by the way, that is deeply flawed (more in Part II), should not be the figure that U-Ark uses to calculate CCPS's per pupil figure. I'm not saying this: U-Ark is.

So did they?

My calculation using these figures comes out to $32,738 -- which is very close to U-Ark's figure of $32,569.


Like I said, there are several reasons the figures don't match exactly: precise charter enrollment figures, including various students in out-of-district placements, minor adjustments to the revenue, etc.

But it's clear that Wolf and his U-Ark team used the wrong revenue figure when making their calculation of Camden's per pupil spending; worse, they made the same mistake they made in 2014, even after they had been publicly corrected!



(Side note: we know they read Bruce Baker's review of their earlier report, because they cite it multiple times.)

Now, as I'll explain in the next post, fixing this problem still makes for a deeply flawed analysis. But let's suppose, just for illustration purposes, they had corrected it. What would the figure be?


Here, we subtract the charter school transfer (find it on page 5 of the User-Friendly Summary). Which, according to U-Ark themselves, is the correct way to approach the calculation. What's the result?

Again, this is a deeply flawed comparison. But it's a much smaller gap than using U-Ark's methods.

Let me end this part by addressing Professor Wolf and his team directly:

Gentlemen, I have shown in this post exactly where my data came from. Maybe you have different, equally credible sources. None of us would know, however, because your sole citation for state data is: "New Jersey Department of Education, School Finance." (p. 33) If you'd care to share your sources, your data, and how you arrived at your calculations (in appropriate detail to allow for replication, a common standard in our field), then please do; I'll happily publish them here. You can reach me at the email address on the left side of the blog.

But as it stands right now, there is more than enough evidence, in my opinion, to entirely dismiss your report and its conclusions.

Part II in a bit...



ADDING: Previous atrocities have been documented. 



* As always: Bruce is my advisor in the PhD program at Rutgers GSE.

** I use the 2015-16 guide because it gives the latest "actual" figures for 2013-14 available from NJDOE.

Thursday, May 11, 2017

Attrition in Denver Charter Schools

Earlier this month David Leonhardt of the New York Times wrote yet another column extolling the virtues of charter schools. I feel like a broken record when I say, once again, that education policy dilettantes like Leonhardt don't seem to understand that it requires more than a few studies showing a few charters in a few cities in a few select networks get marginally better outcomes on test scores to justify large-scale charter expansion.

There are serious cautions when it comes to the proliferation "successful" charters, starting with the fiscal impact on hosting districts as charters expand. We should also be concerned about the abrogation of student and family rights, the lack of transparency in charter school governance, the narrowing of the curriculum in test-focused charters, the racially disparate disciplinary practices in "no excuses" charters, and the incentives in the current system that encourage bad behaviors.

But let's set all that aside and look at the evidence Leonhardt presents to justify his push for more charters:
Unlike most voucher programs, many charter-school systems are subject to rigorous evaluation and oversight. Local officials decide which charters can open and expand. Officials don’t get every decision right, but they are able to evaluate schools based on student progress and surveys of teachers and families. 
As a result, many charters have flourished, especially in places where traditional schools have struggled. This evidence comes from top academic researchers, studying a variety of places, including Washington, Boston, Denver, New Orleans, New York, Florida and Texas. The anecdotes about failed charters are real, but they’re not the norm.
You'll notice that Leonhardt picks cities and states that uphold his argument while excluding others like Detroit, Philadelphia, and Ohio. In addition: I spent a lot of time last year explaining why the vaunted Boston charter sector isn't all it's cracked up to be. I've also documented the mess that is Florida's charter sector. I'll try to get to some of Leonhardt's other examples, but for now: let's talk about Denver.

I'll admit it's one region where I haven't spent much time looking at the charter sector. Leonhardt links to a study that shows some significant gains for charters... although I have some serious qualms about the methodology used in the report. I'm working on something more formal which addresses the issue, but for now (and pardon the nerd talk): I am increasingly skeptical of charter effect research that uses instrumental variables estimators to pump up effect sizes. So far as I've seen, the validity arguments for its use are quite weak -- more to come.

For now, however, let's concede the Denver charter sector does, in fact, get some decent test score gains compared to the Denver Public Schools. The question, as always, is how they do it. Do they lengthen their school day and school year? If so, that's great, but we could do that in the public schools as well. Do they provide smaller class sizes and tutoring? Again, great, but why do we need schools that are not state actors to implement programs like that?

What we want to find are reasons that we can attribute only to the governance structure of charters -- not to resource differences, not to student population differences, but to the inherent characteristics of charters themselves.

And one thing I've found, time and again, is that one of the characteristics of "successful" charters is that they engage in patterns of significant student cohort attrition.


Let me explain what's going on here: this is data for the DSST network, one of the more lauded groups of charter schools in Denver. We're looking at the "class" of each cohort that has come through the entire charter chain; in other words, how big the Class of 2014 was when they were freshman, then sophomores, then juniors, and then seniors. I've done the same with each class back to 2008.

See the pattern? As DSST student classes pass through the charter schools year-to-year, the number of students enrolled shrinks considerably. The Class of 2014, for example, is 62 percent of the size as seniors as it was when freshmen. The shrinkage ranges from 61 to 73 percent over the eight years on the graph.

Where do the kids who leave go? Many likely go back to the Denver Public Schools. Some of those likely drop out, which counts against DPS's graduation rate -- but not the charter schools'. In any case, they aren't being replaced, which I find odd considering how supposedly "popular" charters are.

Some make the case that the larger freshman classes are due to retention: the schools keep the kids for an extra year to "catch them up." Which I suppose is possible... but it raises a host of questions. Do public students have the same opportunities to repeat a grade? Are the taxpayers aware they are paying for this? Why is there still significant attrition between Grade 10 and Grade 11?

Let's look at some other Denver charters and their cohort attrition patterns. Here's KIPP, the esteemed national charter network:


They haven't been running high schools as long as DSST, but the patterns are similar. KIPP's history is as a middle school provider; here are their attrition patterns in the earlier grades:



KIPP's Grade 8 cohorts shrink from 73 to 84 percent of their size in Grade 5. Again: if they're so popular and have such long wait lists -- and if the DPS schools are so bad -- why aren't they backfilling their enrollments? Note too that much of the attrition is after Grade 6. Most Denver elementary schools enroll Grades K to 5. It doesn't appear as if many students come into KIPP looking to move on after only one year; most of the attrition is in the later grades. Why would kids be leaving in the middle of their middle school experience?

Another middle school provider moving into high school is STRIVE:


Grade 8 is between 56 and 80 percent the size of Grade 6. Let's look at one more: Wyatt Academy.


The last class we have data for shrank to 69 percent of its size in First Grade 1 by the time it got to Grade 8.

Let's be clear: cohort shrinkage occurs in DPS as well.


The last year for which we have data was an outlier: the Class of 2018 was 75 percent as big in Grade 8 as it was in Grade 5. For previous years, that figure ranges from 81 to 90 percent. The comparisons to the charters are admittedly tricky: the transition from Grade 5 to 6, for example, is sure to see students moving out of the area or into the private schools, both from DPS and the charters. 

But it's still striking to me that "popular" charters, which are allegedly turning away lottery losers, seem to lose more students proportionally than the "failing" DSP schools.



DPS has a large number of students leave their Grade 9 cohort before Grade 12. Many are dropouts, and that's a serious problem. But why does DPS get slammed for this while the charter high schools are declared "successful" even as they are losing at least as large a proportion of their students as the public high schools?

Again, this is tricky stuff. I'm certainly not going to declare that Denver's charter sector is getting all of its gains from pushing out the lower performers; we don't have nearly enough evidence to make that claim. But neither can we declare definitively, as Leonhardt does, that charter "...success doesn’t stem from skimming off the best." When you lose this many students, particularly in high school, you have to back up and take a more critical view of why some charters get the gains that they do.

One more thing: look at the y-axes on my graphs. The scale of Denver charter school enrollments is nothing like the scale found in DPS. Only recently has STRIVE come around to about 10 percent of DPS's enrollment per class. How can we be sure the gains they make, if any, can be sustained as the sector gets larger?

When charters shed this many kids, there has to be a system that catches them and enrolls them in school. A system that takes them at any time of year, no matter their background. A system that doesn't get to pick and choose which grades it will enroll and when. That system is the public schools; arguably, charters couldn't do what they do without it.

Before we declare charters an unqualified success, we ought to think carefully about whether factors like attrition play a part in helping them realize their test score gains, and what that means for the public school system.

I'll try to get to Denver more this summer. But let's get back to New Jersey next...