September 16, 2009

Data and Natural Security

As countries battle a wave food shortages, trying to ensure supplies and qualm domestic fears, the consequences of food insecurity are growing increasingly vivid these days. Pakistan is described as being at extreme risk by a recent Food Security Risk Index by a British firm– indeed, 19 Pakistanis were killed this week in a scramble for food distributed for Ramadan in a poor Karachi neighborhood. How should we view data or interpret these kinds of rankings for natural security subjects? 

While data plays an important role in understanding the world of natural security, it is, at times, not as readily available as it is for other security issues, nor is it consistently clear what underlying assumptions are actually true. I’m going to be occasionally looking at regressions, data sets and graphs from a variety of sources in order to look at some of the benefits—and drawbacks—of different types of quantitative analysis in natural security.

Famine and persistent hunger can contribute to instability, of high concern in countries in which the U.S. government is operating or has strategic interests, like Pakistan, where food shortages are worsening with violent results. So for my first post, I decided to look at a data set by the Food and Agricultural Organization of the United Nations (FAO) on the prevalence of undernourishment.

Undernourishment

Source: FAO Statistics Division, 2008

The FAO chose to look at panel data for this set. That selection often comes down to what data is available and, if you’re lucky, the study objective. An old professor used to say that if all economists were given three wishes one would always be a good panel data set on their current project. This is true because it allows you to not only see how multiple facets of a problem change, but also how each one contributes to the whole. In this sense, the FAO has done well. We can compare the world totals to regional variations and, although I did not include the full chart here, country-level changes.

The FAO has avoided the dangers of only looking at time-series data, which often provides clear trends but hides important variations. For example, if a researcher looked only at how world undernourishment levels have changed over time they would miss the important fact that some regions are suffering from dramatic increases in food insecurity. This data masks regional variations and doesn’t adequately explain the dynamic global picture.  Although the FAO provides world, regional and country-level data, most papers will only cite one of these statistics. For example, while in developing countries in Africa undernourishment fell from 28% to 26%, in the Economic Community of the Great Lakes Countries (ECGLC), which consists of Burundi, the Democratic Republic of the Congo, and Rwanda, undernourishment rose from 33% to 70%. Which group, region, or development level of countries one looks at has a dramatic impact on findings.

The FAO data falls prey to another fault: failure to report their error term. Due to the high uncertainty inherent in population statistics, knowing sampling error and if changes over time are statistically significant would add credibility to the FAO’s data. Without it, only the large changes (33% to 70%, for example) which reflect a broad change in political climate, rather than the small, across-the-board changes, are believable without further information. This is a real problem in statistics because outliers should not be the data quoted. Yet, due to the inherent uncertainty of the discipline, they are often the only numbers that we can draw reliable conclusions from. This is a classic case of being trapped in a double-bind and either citing the large swings or not knowing if the broader conclusions you are drawing are actually true.

The other issue to be aware of with the FAO data is their interval period. Several observations vary dramatically among all three time periods. These fluctuations point to a variable data pattern rather than a trend. Generalizing observations over the time period may present a false correlation.  Trend lines being drawn between the two end points probably don’t correctly fit the data. When the viewing window is restricted and the number of observations is minimal, it is possible that random changes could be analyzed as trends.

Part of our inspiration to do this stems from the U.S. government’s recent launch of www.data.gov, which I searched for data that might be useful for natural security issues. We found that data is not readily available for all problems. Almost all of the data available is restricted to public-domain, commissioned studies, and it provides mainly domestic data. U.S. numbers for water quality, storm trends, active mines and mineral plants, and the oil and gas assessment databases are all available. International and military data, however, are conspicuously absent, and the State Department and the Department of Defense have not yet uploaded many studies. The United Nations and the World Bank, on the other hand, look at cross-country studies and rarely reach the depth of single-country, multi-factor studies. The U.N. has many datasets on energy, greenhouse gases, human development and food, which provide evidence on a broader picture but can be of limited utility from a U.S. national security perspective.  As more data sites come online—from U.S. government arches to sites by the United Nations and the World Bank—we’ll be thinking about how visual and numerical representations offer a different pathway to understanding our work. And of course, if you know of any good data that we should be considering, feel free to email us or post comments to the blog!