Bym model

hi all , I want to run the bym model using open bugs. in the BYM model, I want to remove a province from among the neighbors, what code should I use? and how is the work report of bym? can you guide me?

Hi, @shayeste. These are the Stan forums. If you need help with BUGS, I’d suggest the BUGS forums. What you’re looking for is the GeoBUGS extension for the ICAR models like BYM.

Assuming by “BYM” you mean Besag-York-Mollié, you might be interested in the Stan case study on ICAR models that shows how to go from the BYM model to the BYM2 model with penalized complexity priors to mix heterogeneous and spatial effects.

1 Like

hi dear @Bob_Carpenter .
thanks for your help . what code should i use if i want to get the neighborhood in bym model in R?
I am so confused . i can’t get it…
i would be grateful if you could guide me…

Sorry, @shayeste but I’m not sure what you’re asking. Are the areal regions in your BYM model neighborhoods? If so, you can just figure out which ones are adjacent and then code the adjacency as a matrix as explained in the linked case study. It then goes on to show how to code this all up efficiently in Stan.

If you have specific questions on that, we can try to help. Can you fit the hierarchical model without a spatial component in Stan? That is, the heterogeneous portion of BYM? If not, I’d suggest starting there. Fitting the full BYM models as your first examples in Stan is quite challenging as there are a lot of moving pieces. So I’d suggest starting even more gently, with a non-hierarchical model.

Hi @shayeste

Depending on your interests for this project, you might find the geostan R package helpful. You can fit BYM models, and other spatial models like proper CAR models. You can use spdep (poly2nb; nb2mat) or geostan (shape2mat) functions to create a spatial weights matrix. With a weights matrix, you can drop a neighbor by converting the appropriate index from 1 to 0.

The spdep package also has some functions that can help you visualize the connections in that matrix (using spdep’s nb format) to see if there are any missing or otherwise questionable links.

That is, find out the row number for the place of interest (i); then get the row number of the place that is not supposed to be categorized as its neighbor (j). Then convert the corresponding value in the weights matrix C to zero:
C[i, j] <- 0
If your spatial data (geometry) is stored as simple features or SpatialPolygons, and say its called spdf, then you can create C with
C <- geostan::shape2mat(spdf, "B").

See Examples section of the documentation: Intrinsic autoregressive models — stan_icar • geostan
Bayesian Spatial Analysis • geostan

If you prefer to build a custom Stan model, geostan has some convenience functions for that also (e.g., prep_icar_data and prep_car_data) See: GitHub - ConnorDonegan/Stan-IAR: Spatial intrinsic autoregressive models in Stan

3 Likes

Nice. @mitzimorris should see this.

dear @Bob_Carpenter and dear @cmcd
thanks for your time .
my question is like this:
i run bym model , this is my thesis topic.
i want to check the prevalence of a disease in the whole country , but i don’t have the data related to a province . as a result , i have to get the related neighborhoods , so that it does not include that province.
but i don’t know how to do it , that is , how to get all the neighborhoods except that province?

So you have disease incidence data by region (or small areas), and you are missing the disease data for one of those region—is that right?

1 Like

@cmcd yeah , exactly

I think this is what you’re looking for. The estimates will borrow from neighboring areas, and will also be informed by the total amount of inferred variability in risk. Maybe others can weigh in on this too.

You could add covariates, as in any other model.

If the graph structure you’re using is disconnected, or has islands, then the code will get more complicated.

functions {

/** ICAR prior
  * following Morris et al. (2019)
  * intrinsic autoregressive prior 
  * @return lpdf of ICAR prior minus any constant terms
  */
  real icar_normal_lpdf(vector phi, int N, int[] node1, int[] node2) {
    return -0.5 * dot_self(phi[node1] - phi[node2]) +
      normal_lpdf(sum(phi) | 0, 0.001 * N);
  }
}

data {
  int n;    // no. areas
  int<lower=1> n_edges; // icar stuff
  int<lower=1, upper=n> node1[n_edges];
  int<lower=1, upper=n> node2[n_edges];
  int y[n];  // disease counts 
  vector[n] log_pop; // log of population at risk
  int n_obs; // number of areas with disease data
  int obs_idx[n_obs]; // index for observed values of y
}

parameters {
  real alpha; // intercept
  vector[n] phi_tilde;  // icar on standard-normal scale
  real<lower=0> spatial_scale; // icar scale
  vector[n] theta_tilde; // heterogeneous partial pooling term 
  real<lower=0> theta_scale; // scale for theta_tilde
}

transformed parameters {
  vector[n] log_rate = alpha + phi_tilde * spatial_scale + theta_tilde * theta_scale;
  vector[n] eta = log_pop + log_rate;
}

model {
  phi_tilde ~ icar_normal(n, node1, node2); // icar prior
  theta_tilde ~ std_normal();
  spatial_scale ~ std_normal(); 
  theta_scale ~ std_normal();
  alpha ~ normal(-5, 5); // this is the prior for the mean log rate
  y[obs_idx] ~ poisson_log(eta[obs_idx]); // likelihood for observations
}

Here is an example:

library(rstan)
library(geostan)
data(georgia)

# compile Stan model
mod <- stan_model("icar.stan")

# create spatial connectivity matrix
C <- shape2mat(georgia, "B")

# create data for icar-normal_lpdf
icar_dl <- prep_icar_data(C)

# add disease data
icar_dl$n <- nrow(georgia)
icar_dl$y <- georgia$deaths.female

## including log of population at risk
icar_dl$log_pop <- log(georgia$pop.at.risk.female)
    
# which areas do have outcome data?
icar_dl$obs_idx <- which(!is.na(icar_dl$y))
icar_dl$n_obs <- length(icar_dl$obs_idx)

# replace NA values with a placeholder value (these will not be used)
icar_dl$y[-icar_dl$obs_idx] <- 9999

# draw samples from the model
samples <- sampling(mod,
                    data = icar_dl,
                    iter = 2500,
                    cores = 4)

# summarise posterior distributions
eta <- as.matrix(samples, pars = "log_rate")
eta <- exp( eta )
eta.mu <- apply(eta, 2, mean)
eta.ci <- apply(eta, 2, quantile, probs = c(0.025, 0.975))

# compare with crude rates
plot(georgia$rate.female * 10e3, eta.mu * 10e3)
abline(0, 1)
 
# summary of posterior distribution for places with missing disease data
# cases per 10,000 at risk
cbind( mean = eta.mu[-icar_dl$obs_idx] * 10e3, t( eta.ci[ , -icar_dl$obs_idx] * 10e3 ) )

3 Likes

@cmcd thanks alot for your kindness.

2 Likes

This is no problem at all in a Bayesian model. It’s perfectly fine to have zero observations in a region. For example, if it’s binomial data, it’s just 0 successes out of 0 observations. Or you can try to get clever and drop the terms from the likelihood, but that shouldn’t be necessary for efficiency.