Areal downscaling (spatial disaggregation)

Dear Stan community,

After finding out that the ICAR priors can be implemented in Stan (Morris et al.,2019), I was wondering whether it’s possible to use RStan for disaggregating areal data, namely counts.

For the sake of discussion, let \mathcal{D} be the study area with spatial support consisting of M non-overlapping regions, \left\{ \mathcal{R}_{m} \right\}_{m=1}^{M}, where y_{m} denotes the observed count of \mathcal{R}_{m}.

The ultimate goal is to estimate the \textit{latent} counts, say \widetilde{Y}_{n}, corresponding to N non-overlapping regions comprising a finer and misaligned spatial support, \left\{ \mathcal{S}_{n} \right\}_{n=1}^{N}.

Within a hierarchical framework, if \widetilde{Y}_{n} is assumed to have a Poisson distribution with intensity rate \lambda_{n}:
\widetilde{Y}_{n}\mid \lambda_{n} \sim Poisson (A_{n}\lambda_{n}),
where A_{n} is the area of \mathcal{S}_{n} and \log(\lambda_{n})=Z_{n1}\beta_{1}+\ldots+Z_{nP}\beta_{P}+U_{n}
with Z_{np} the measurement of covariate Z_{p} and U_{n} the spatial effect associated to \mathcal{S}_{n}, then
{Y}_{m}\mid\boldsymbol{\lambda} \sim Poisson(\sum_{n:\mathcal{S}_{n}\cap\mathcal{R}_{m}\neq\emptyset}A_{mn}\lambda_{n}),
where A_{mn} is the area of \mathcal{S}_{n}\cap\mathcal{R}_{m}.

Further, the prior for the set of spatial effects \mathbf{U} is an IGMRF (Intrinsic Gaussian Markov Random Field) with `precision matrix’ \kappa\mathbf{W}, where \mathbf{W} is the neighborhood structure of regions \mathcal{S}_{n}. A fully Bayesian hierarchical model is set by specifying a multivariate normal prior for the fixed effects \boldsymbol{\beta} and a Gamma hyperprior for the precision parameter \kappa.

Is it possible to fit this model via the stan function? I should mention that I’m new to Stan, so any suggestion about how to use Stan for estimating latent variables when the number of observations is smaller than the number of latent variables is appreciated in advance.

Best wishes,
Roman.

3 Likes

@cmcd @mitzimorris

1 Like

you want counts by subregion? that sounds like an MRP problem - http://www.stat.columbia.edu/~gelman/research/published/misterp.pdf

or am I not understanding the question?

2 Likes

Hi Roman,

Can you share the sources you’re looking at for this model? Banerjee, Carlin, and Gelfand’s book has a section on this type of problem (‘Nested block-level modeling’) (on first impression, I like your approach more). This is often referred to as a “spatial misalignment” problem, and I haven’t done that before. I think you should be able to build that in Stan, though it might be difficult to sample from. Do you have a more basic Stan model to start with?

Connor

4 Likes

Hi Mitzi,

thanks for your reply.

Yes, I want counts by subregions but they aren’t nested. Besides, I’d like to account for spatial dependence. Does the MPR approach allow it?

Hi Connor,

thanks for your reply.

My main source is Mugglin et al. (2000). If I’m not wrong, it’s the basis of the next section of Banerjee et al.'s book (Nonnested block-level modeling).

I think I could start implementing the same model but removing the spatial effects (Mugglin & Carlin, 1998) in order to learn how to use Stan for estimating latent variables. Do you mind to point me to resources for achieving it? However, my main objetive is to implement the first model.

Oh alright, that explains why the summation was confusing to me.

This code is from another post, with different purpose, but if a simple example of a latent variable will help, this may be a starting point. Linear regression with data-dependent error variances - #6 by cmcd

Does that help at all?

It sounds like the difficult part is building the data itself, and probably the indexing inside the model will be a little challenging.

data {
  int n;
  vector[n] z; // observations or estimates
  vector[n] se; // standard errors of the observations
  vector[n] x;  // covariate
}

parameters {
  vector[n] y;
  real alpha;
  real beta;
  real<lower=0> sigma;
}

model {
 // data model
  z ~ normal(y, se);   
 // process model
  y ~ normal(alpha + x * beta, sigma);   
 // parameter models 
    // put priors on your parameters
}
1 Like

Hi Connor,

thanks a lot for providing me some code.

It will definitely be useful for building my stan program. And you are right, constructing the inputs is challenging. Fortunately, I have already made some data wrangling.

Best,
Roman.

1 Like

In theory this is possible with full Poisson process models – when modeling any observed counts the latent field is integrated over the observed region to give a Poisson intensity appropriate to that particular observation. The intensities for multiple observations of overlapping, non-nested regions are then appropriately correlated because the same latent field is integrated over any of the intersections.

The problem is that Poisson process models really can’t be implemented in practice because we can’t analytically integrate random fields over arbitrary regions.

One way to interpret an ICAR model is as a discretized Poisson process where the parameter at each node captures the integral of some hypothetical latent field in some hypothetical region around that node (mathematical caveats apply), but the correspondence isn’t always exact. In particular the discretization makes it impossible to model any observations “between” the nodes.

That said, if you have enough nodes such that every observed region contains at least a few then you might be able to approximate the overlapping Poisson process observations with something like an ICAR model. For each observed region identify the corresponding nodes in that region and then sum the corresponding intensities,

\lambda_{\text{obs}} = \sum_{n = 1}^{N} \lambda_{n},

or when modeling log intensities,

\lambda_{\text{obs}} = \sum_{n = 1}^{N} \exp( \alpha_{n}).

The challenge is then finding the right discretization that’s fine enough to adequately model the overlap between observed regions but not so fine that local nodes are observed only ever in clusters and the node parameters in those clusters become highly degenerate.

1 Like

Hi Michael,

thanks a lot for enlightening me with your observations.

First, I weren’t precise when I said that subregions aren’t nested - apologies Mitzi. What I meant to say was that the latent regions aren’t completely nested in a specific observed region, that is, some latent regions overlap two or more observed regions. Actually, there is no overlapping between observed regions. The latent regions are also an spatial partition of the study area (no overlapping). Nevertheless, your comments are still valid (I think so).

Please correct me if I’m wrong and apologies in advance if I’m not being clear enough. You say that Y_{m}\mid\boldsymbol{\lambda} makes sense (first point), however, it isn’t available in closed form (second point). Further, if the spatial support is discretized, it’s impossible to infer at different supports, yet such a discretization is an approximation to the underlying process (third point). In that context, a ‘good’ discretization could provide a clue of the underlying process (forth point - the sum of the (log) intensities should be weighted by areas, isn’t it? ).

I’m trying to understand your last point - it really puzzles me. Does it change in the light of my explanation of non-nesting? Anyway, what does it mean degenerate in this context? It seems that I’m missing something.

Finally, here is the Stan program I’m using for implementing my model. I’d rather prefer to model the observations in the log scale in order to avoid numerical overflows but it seems complicate to compute the logsumexp of the rates as I need to use an apply function in the Stan language.

functions {
  real icar_normal_lpdf(vector phi, int N, int[] node1, int[] node2) {
    return -0.5 * dot_self(phi[node1] - phi[node2])
      + normal_lpdf(sum(phi) | 0, 0.001 * N);
  }
} 
data {
  int<lower=0> M;
  int<lower=0> y[M]; // observations
  int<lower=0> N;
  int<lower=0> N_edges;
  int<lower=1, upper=N> node1[N_edges];
  int<lower=1, upper=N> node2[N_edges];  // and node1[i] < node2[i]
  matrix[M,N] A;
  vector[N] Z; // covariates
}
parameters {
  real beta;
  real<lower=0> kappa;       // spatial precision
  vector[N] phi_raw;         // spatial effects
}
transformed parameters {
  vector[N] phi= kappa*phi_raw;
  vector[N] log_lambda = Z*beta + phi;
}
model {
 // data model
  y ~ poisson(A*exp(log_lambda));
 // priors
  beta ~ normal(10,1e+1);
  kappa ~ gamma(2,1e+2); 
  target += icar_normal_lpdf(phi_raw | N, node1, node2);
}
generated quantities {
  vector[N] mu_tilde = exp(log_lambda);
}

Hi Connor,

how did you include the code appropriately?

Cheers,
Roman.

Use three backticks ``` before and after the code

2 Likes

If you have an observational model y \sim \text{Poisson}(\lambda_{1} + \lambda_{2}) then observations inform only the sum \lambda_{1} + \lambda_{2}, not either component independently. In other words observations spanning multiple “latent areas” would inform only the sum of the corresponding intensity parameters.

Change to y ~ poisson_log(log_lambda + log(A));. Even better cache log_A = log(A) in the transformed data block.

Whether or not you weight by area depends on how you interpret the ICAR model.

An ICAR graph doesn’t uniquely define a partition of the space into disjoint areas but let’s say that it did, or more realistically we chose a partition ourselves. In this case we could associate each node in the graph with a particular area. There’s still an ambiguity into what each ICAR parameter means. For example one could interpret the ICAR parameter for a given node as the integrated Poisson process intensity across the surrounding area, in which case one doesn’t need to weight by the area of an observation. Alternatively one could interpret it as a constant differential Poisson process intensity that spans across the surrounding area, in which case areas would have to be included.

The latter interpretation allows you to model observations that only partially overlap with each “nodal area” by including the appropriate intersection areas, but the ICAR prior model also behaves a bit surprisingly as neighboring nodes are correlated the same no matter how much “area” is in between them.

In the former interpretation observations cannot partially cover any node. They are either confined to one node (the most common approach) or they entirely cover multiple nodes (in which case the Poisson intensity is the sum of the included node intensities). This assumption works best when the graph discretization is fine enough that partial overlaps are negligible. My limited understanding is that the former interpretation is more consistent with the mathematical derivation of the ICAR model, and hence would be the best approach with which to start if you can manage a fine enough grid.

1 Like

@betanalpha whether or not you weight by area depends on how you interpret the ICAR model.

What is your research question, Román @kokorap27 ? That should determine how the model is interpreted and whether it is appropriate to be including the log area (or some other term, like log population). Often it is essential to the inference being made. Similarly for the partition—it sounds like the partitioning of the space is predetermined, no?

@betanalpha observations spanning multiple “latent areas” would inform only the sum of the corresponding intensity parameters.

If you see a strong north-south trend in the level of a variable, then depending on the subject matter and the scales in question, it may be reasonable to infer that the trend continues at a smaller scale; specifically, we may place a higher probability on the proposition that a strong trend at the observed scale continues similarly, rather than is interrupted, at the unobserved scale. I think that is the background information that the ICAR prior model is encoding into these change of support problems, and it enables some inferences to be made. What you’re saying sounds true in itself but it is missing important aspects of spatial inferences like this—the observations are considered jointly, not one at a time, and that changes the nature of the problem.

4 Likes

Hi Michael,

thanks a lot for thoroughly explaining me some quirks of the ICAR model.

Unfortunately, I haven’t managed to implement my model, however, your comments bring some light on the theoretical justification of why this is happening, apart from a potential numerical overflow (log_lambda+log(A) isn’t well defined as A is a matrix - as I mentioned, I think I need to compute the logsumexp of the corresponding intensities).

If I’m not wrong, the model I’m trying to implement could be degenerated, and thus, the chains aren’t converging. A more informative prior for the spatial effects could sort this out.

More comments in regard to my model are included in my reply to Connor.

Hi Connor,

thanks a lot for taking a practical approach.

My research questions is how to disaggregate population counts, that is, is it possible to estimate the underlying population intensity (density)?

I’m using a land-cover layer as ancillary data (covariates), and thus, it implies to take into account areas of both observed and latent regions, as well as of land-cover classes. Then, only the latent support is not determined but I decided to use a regular lattice (1 km^2 resolution) for which population counts are also available - this allows me to evaluate predictive accuracy of my areal downscalers.

Your second thought sounds relevant. I might not fully understand it, but it defenitely points to the right direction. Hope the next figures align with your idea -the first map displays population density at municipality level while the second map correspond to the posterior mean surface when using an ICAR as prior for the spatial effects and assuming that observations are gaussian (which is unsustainable for population counts).

2 Likes

An ICAR prior model can capture this assumption, but it won’t always.

If the spatial trends are assumed to be uniform below the observation scale then the configuration of the ICAR prior model has be to carefully tuned so that the pairwise correlations are strong between all nodes within that scale. Any shorter-range correlation will be consistent with any observation, and hence have to be included in the posterior distribution unless explicitly suppressed by the ICAR prior, which is exactly the nature of the degeneracy that I mentioned above.

Yes suppressing short-range correlations is possible with ICAR prior models but the required configuration is subtle and definitely not shown in most of the common demonstrations.