Bym model

shayeste · August 21, 2022, 2:55pm

hi all , I want to run the bym model using open bugs. in the BYM model, I want to remove a province from among the neighbors, what code should I use? and how is the work report of bym? can you guide me?

Bob_Carpenter · August 22, 2022, 6:30pm

Hi, @shayeste. These are the Stan forums. If you need help with BUGS, I’d suggest the BUGS forums. What you’re looking for is the GeoBUGS extension for the ICAR models like BYM.

Assuming by “BYM” you mean Besag-York-Mollié, you might be interested in the Stan case study on ICAR models that shows how to go from the BYM model to the BYM2 model with penalized complexity priors to mix heterogeneous and spatial effects.

shayeste · August 23, 2022, 9:38am

hi dear @Bob_Carpenter .
thanks for your help . what code should i use if i want to get the neighborhood in bym model in R?
I am so confused . i can’t get it…
i would be grateful if you could guide me…

Bob_Carpenter · August 23, 2022, 8:54pm

Sorry, @shayeste but I’m not sure what you’re asking. Are the areal regions in your BYM model neighborhoods? If so, you can just figure out which ones are adjacent and then code the adjacency as a matrix as explained in the linked case study. It then goes on to show how to code this all up efficiently in Stan.

If you have specific questions on that, we can try to help. Can you fit the hierarchical model without a spatial component in Stan? That is, the heterogeneous portion of BYM? If not, I’d suggest starting there. Fitting the full BYM models as your first examples in Stan is quite challenging as there are a lot of moving pieces. So I’d suggest starting even more gently, with a non-hierarchical model.

cmcd · August 23, 2022, 9:15pm

Hi @shayeste

Depending on your interests for this project, you might find the geostan R package helpful. You can fit BYM models, and other spatial models like proper CAR models. You can use spdep (poly2nb; nb2mat) or geostan (shape2mat) functions to create a spatial weights matrix. With a weights matrix, you can drop a neighbor by converting the appropriate index from 1 to 0.

The spdep package also has some functions that can help you visualize the connections in that matrix (using spdep’s nb format) to see if there are any missing or otherwise questionable links.

That is, find out the row number for the place of interest (i); then get the row number of the place that is not supposed to be categorized as its neighbor (j). Then convert the corresponding value in the weights matrix C to zero:
C[i, j] <- 0
If your spatial data (geometry) is stored as simple features or SpatialPolygons, and say its called spdf, then you can create C with
C <- geostan::shape2mat(spdf, "B").

See Examples section of the documentation: Intrinsic autoregressive models — stan_icar • geostan
Bayesian Spatial Analysis • geostan

If you prefer to build a custom Stan model, geostan has some convenience functions for that also (e.g., prep_icar_data and prep_car_data) See: GitHub - ConnorDonegan/Stan-IAR: Spatial intrinsic autoregressive models in Stan

Bob_Carpenter · August 23, 2022, 9:30pm

Nice. @mitzimorris should see this.

shayeste · August 23, 2022, 9:45pm

dear @Bob_Carpenter and dear @cmcd
thanks for your time .
my question is like this:
i run bym model , this is my thesis topic.
i want to check the prevalence of a disease in the whole country , but i don’t have the data related to a province . as a result , i have to get the related neighborhoods , so that it does not include that province.
but i don’t know how to do it , that is , how to get all the neighborhoods except that province?

cmcd · August 23, 2022, 9:54pm

So you have disease incidence data by region (or small areas), and you are missing the disease data for one of those region—is that right?

shayeste · August 23, 2022, 10:42pm

@cmcd yeah , exactly

cmcd · August 24, 2022, 12:53am

I think this is what you’re looking for. The estimates will borrow from neighboring areas, and will also be informed by the total amount of inferred variability in risk. Maybe others can weigh in on this too.

You could add covariates, as in any other model.

If the graph structure you’re using is disconnected, or has islands, then the code will get more complicated.

functions {

/** ICAR prior
  * following Morris et al. (2019)
  * intrinsic autoregressive prior 
  * @return lpdf of ICAR prior minus any constant terms
  */
  real icar_normal_lpdf(vector phi, int N, int[] node1, int[] node2) {
    return -0.5 * dot_self(phi[node1] - phi[node2]) +
      normal_lpdf(sum(phi) | 0, 0.001 * N);
  }
}

data {
  int n;    // no. areas
  int<lower=1> n_edges; // icar stuff
  int<lower=1, upper=n> node1[n_edges];
  int<lower=1, upper=n> node2[n_edges];
  int y[n];  // disease counts 
  vector[n] log_pop; // log of population at risk
  int n_obs; // number of areas with disease data
  int obs_idx[n_obs]; // index for observed values of y
}

parameters {
  real alpha; // intercept
  vector[n] phi_tilde;  // icar on standard-normal scale
  real<lower=0> spatial_scale; // icar scale
  vector[n] theta_tilde; // heterogeneous partial pooling term 
  real<lower=0> theta_scale; // scale for theta_tilde
}

transformed parameters {
  vector[n] log_rate = alpha + phi_tilde * spatial_scale + theta_tilde * theta_scale;
  vector[n] eta = log_pop + log_rate;
}

model {
  phi_tilde ~ icar_normal(n, node1, node2); // icar prior
  theta_tilde ~ std_normal();
  spatial_scale ~ std_normal(); 
  theta_scale ~ std_normal();
  alpha ~ normal(-5, 5); // this is the prior for the mean log rate
  y[obs_idx] ~ poisson_log(eta[obs_idx]); // likelihood for observations
}

Here is an example:

library(rstan)
library(geostan)
data(georgia)

# compile Stan model
mod <- stan_model("icar.stan")

# create spatial connectivity matrix
C <- shape2mat(georgia, "B")

# create data for icar-normal_lpdf
icar_dl <- prep_icar_data(C)

# add disease data
icar_dl$n <- nrow(georgia)
icar_dl$y <- georgia$deaths.female

## including log of population at risk
icar_dl$log_pop <- log(georgia$pop.at.risk.female)
    
# which areas do have outcome data?
icar_dl$obs_idx <- which(!is.na(icar_dl$y))
icar_dl$n_obs <- length(icar_dl$obs_idx)

# replace NA values with a placeholder value (these will not be used)
icar_dl$y[-icar_dl$obs_idx] <- 9999

# draw samples from the model
samples <- sampling(mod,
                    data = icar_dl,
                    iter = 2500,
                    cores = 4)

# summarise posterior distributions
eta <- as.matrix(samples, pars = "log_rate")
eta <- exp( eta )
eta.mu <- apply(eta, 2, mean)
eta.ci <- apply(eta, 2, quantile, probs = c(0.025, 0.975))

# compare with crude rates
plot(georgia$rate.female * 10e3, eta.mu * 10e3)
abline(0, 1)
 
# summary of posterior distribution for places with missing disease data
# cases per 10,000 at risk
cbind( mean = eta.mu[-icar_dl$obs_idx] * 10e3, t( eta.ci[ , -icar_dl$obs_idx] * 10e3 ) )

shayeste · August 24, 2022, 9:08pm

@cmcd thanks alot for your kindness.

Bob_Carpenter · August 26, 2022, 7:52pm

This is no problem at all in a Bayesian model. It’s perfectly fine to have zero observations in a region. For example, if it’s binomial data, it’s just 0 successes out of 0 observations. Or you can try to get clever and drop the terms from the likelihood, but that shouldn’t be necessary for efficiency.

Topic		Replies	Views
BYM model : translating from WinBUGS to STAN Modeling	3	1433	October 30, 2017
BYM2 spatial model - Making predictions to new area/geography from fit model Modeling specification	4	1115	March 18, 2019
Paper and R package for spatial models Publicity ecology , public-health , spatial , rstantools	0	882	July 2, 2020
Updates to case study on spatial models for areal data - Poisson CAR/IAR BYM and BYM2 Modeling	0	886	January 23, 2018
Paper on CAR models and spatial analysis pakage Publicity	2	645	October 6, 2021

Bym model

Related topics