Bayesian Model-Based Geostatistics

Sources of uncertainty in mapping

Mapping a quantity (for example, prevalence of G6PD deficiency) means determining its value at all locations in a region of interest. Georeferenced measurements of the quantity at certain locations, combined with scientific understanding of the biological processes involved, can contribute certainty to maps.

The scientific understanding used in MAP's current malaria mapping is straightforward; environmental factors are known to affect malaria prevalence, malaria prevalence is known to vary in a relatively predictable fashion in space and time, and observed malaria prevalence depends heavily on people’s age.

In spatial epidemiology, it is common to find that very few measurements have been taken in large parts of the geographical region of interest. The available measurements may be error-prone or incompletely reported, and almost always it is a subset of the local population that has been surveyed.

Bayesian model-based geostatistics as a framework for managing map uncertainty

When we want to infer the prevalence of malaria in 2 to 10 year olds in 2010 (for example) from a series of georeferenced survey results, it is not possible to determine the prevalence map exactly because many locations have not been surveyed. In addition, surveys have been conducted at different times and in different age groups.

The goal of Bayesian model-based geostatistics is not to determine unknown maps exactly, but to determine probability distributions for them. A probability distribution for a map can be understood loosely as a very large set of candidate maps and a probability of correctness for each.

The probability that any of the very many individual (candidate) maps is the single correct map is small, so it is not useful to report the single most likely map. Taken together, however, the candidate maps make it possible to compute the probability of correctness of statements about the map, such as "Plasmodium falciparum endemicity at location x in 2010 was between 0.1 and 0.3".

Summary maps

Consider the statement “Plasmodium falciparum endemicity at location x in 2010 was less than value y”. By experimenting, we can find a value for y such that the probability of the statement being true is equal to 0.5. This value of y is the median of Plasmodium falciparum endemicity at location x.

By computing the median at many locations and plotting these, we can create a median map.

The median map is not a candidate map and it fails to capture important features of the candidate map such as the non-smooth, patchy nature of spatial variation. Still, it is a useful visualization of one important feature of the full set of candidate maps that we have generated. The same is true when we construct a mean map.

Maps can also be made of more complicated summaries. By computing the probability that Plasmodium falciparum endemicity falls within each of several classes at each pixel, we can construct a category map. Or we can combine a set of candidate Plasmodium falciparum endemicity maps with a set of bednet coverage maps in a single summary of priority for donors, i.e. a donor prioritisation map.

We provide a comprehensive description of our modelling approach in:

Patil, A.P., Gething, P.W., Piel, F.B. and Hay, S.I. (2011). Bayesian geostatistics in health cartography: the perspective of malaria. Trends in Parasitology 27(6): 246-253