April 19, 2011

Guest Post: Rules for Using Statistics Outside of Baseball

Longtime contributor to the blog Erin "Charlie" Simpson is back with a guest post for the ages...

Instead of going line by line through MAJ Thiel’s SWJ paper (which I characterized on the Twitters as “horrible, terrible stats work”), I’d like to offer some general guidelines for policy-relevant, conflict research. As Ex will tell you, I am not an Iraq expert. But I know a little bit about COIN and another bit about quantitative research.

1) Big Claims require Big Methods. I’m not one to argue that sophisticated statistics can answer all of our research and policy questions. But if you want to wade in on one of the biggest (conflict) policy debates of the last 10 years, you best bring a lot of stats firepower. Correlations among yearly, national data won’t cut it. There are people who do this for a living: Ivy League professors, Army ORSAs, DIA analysts, DARPA geeks, think-tank types. And they do it with care and sophistication. Learn from them, understand the data and model choices they make, and realize the complexity and contingency of the problem at hand. We cannot adjudicate these complicated causal claims with descriptive statistics.

2) Avoid Sigacts. Sigacts suck. I’m sorry. But they do. They are a function of our presence. More troops (outside of more bases) leads to more sigacts? <sarcasm>You don’t say!</sarcasm> Sigacts are as much a measure of our presence as they are of violence.* (There are also a ton of non-violent sigacts reported. So make sure you knock out those key leader engagements and non-battle injuries before you run your analysis.)

*And as we know, COIN isn’t just about violence (if you’re a Kalyvas
person, you know violence has a non-monotonic relationship with control such that low-violence doesn’t always mean good things). So, sigacts are a bad measure of violence and violence is an unreliable measure of stability or “progress” or whatever. But that’s a slightly different debate.

What I'm trying to say here is: Moneyball that shit and find the COIN version of on-base percentage or WHIP.

3) Correlation is not causation. We all know this. But did you also know that low correlation does not preclude findings of causation? Two variables may appear to have a low correlation – until you control for various background conditions. Sometimes this can be tested with jury-rigged chi-square analysis (stratifying one of the variables of interest into various segments -- for example, divvying up Iraqi provinces by #’s of battalions present in 2006 and seeing if there are statistically different levels of violence in 2007). But the only real way to determine which variable among many has a causal effect is with something like regression analysis – correlation won’t cut it.

4) Model specification matters. Ok, so now you want to run some regressions? Which kind? For most conflict data, you won’t want ordinary least squares (OLS). In the parlance of our time, you’ll need to consider the underlying “data generating process.” How do the data come to be observed, and which models’ statistical assumptions best match that process? In general conflict researchers should evaluate various time series, time series-cross sectional, and count models (ie, Poisson) for their work.

5) Level of analysis matters more. How do you plan to aggregate your data? In many instances conflict researchers will want to look at how violence changes across time and space. Global investigations of violence (think Correlates of War or Fearon-Laitin style research) will look at the country-year. That is, annual level national data. This data is usually pre-collected and easy to work with. But if you’re focusing on Iraq or Afghanistan, you need subnational data. And while these wars are long, 5-10 years doesn’t generate enough data points for a useful time series. The more dynamic the conflict, the more detailed you want the data. So you need to dig down to province-month or district-week. (In Afghanistan, sigacts are relatively stable at the district-week level. If you’ve got some data or computing horsepower, you can even carve up the whole country into 10kmX10km grid and go from there.) Unfortunately, that means your other variables need to be measured at the same level, which can be tricky. But them’s the rules.

6) Regression has limitations, too. If you’re doing some sort of “policy evaluation” chances are we didn’t randomly assign the policy “treatment.” What does that mean? That means we probably spent development money in the most violent areas. Or established joint-security stations in safe areas first. Or otherwise implemented a policy based on the very thing you’re trying to study. From a causal inference perspective, that’s a humdinger. One set of solutions is to “match” or pair districts based on their “propensity for treatment,” which can deal with some of the non-random assignment problems. (See Gary King’s paper on health policy evaluation in Mexico for a good example.)
There is a lot of good work that needs to be done in the realm of conflict research. Let’s figure out how to do it well.

(Those interested on the academic side may want to get involved in the Minerva-grant funded Empirical Studies of Conflict project run by Jake Shapiro, Eli Berman, Joe Felter and Radha Iyengar. Otherwise, talk to me about cool kids at Caerus Associates.)

From Abu Muqawama: check out Mike Few in SWJ while you're at it. Also, there is a good conversation on Twitter between @drewconway, @charlie_simpson, @abumuqawama, @chrisalbon, @jay_ulfelder and others on this post.