hi all , I want to run the bym model using open bugs. in the BYM model, I want to remove a province from among the neighbors, what code should I use? and how is the work report of bym? can you guide me?

Hi, @shayeste. These are the Stan forums. If you need help with BUGS, Iâ€™d suggest the BUGS forums. What youâ€™re looking for is the GeoBUGS extension for the ICAR models like BYM.

Assuming by â€śBYMâ€ť you mean Besag-York-MolliĂ©, you might be interested in the Stan case study on ICAR models that shows how to go from the BYM model to the BYM2 model with penalized complexity priors to mix heterogeneous and spatial effects.

hi dear @Bob_Carpenter .

thanks for your help . what code should i use if i want to get the neighborhood in bym model in R?

I am so confused . i canâ€™t get itâ€¦

i would be grateful if you could guide meâ€¦

Sorry, @shayeste but Iâ€™m not sure what youâ€™re asking. Are the areal regions in your BYM model neighborhoods? If so, you can just figure out which ones are adjacent and then code the adjacency as a matrix as explained in the linked case study. It then goes on to show how to code this all up efficiently in Stan.

If you have specific questions on that, we can try to help. Can you fit the hierarchical model without a spatial component in Stan? That is, the heterogeneous portion of BYM? If not, Iâ€™d suggest starting there. Fitting the full BYM models as your first examples in Stan is quite challenging as there are a lot of moving pieces. So Iâ€™d suggest starting even more gently, with a non-hierarchical model.

Hi @shayeste

Depending on your interests for this project, you might find the geostan R package helpful. You can fit BYM models, and other spatial models like proper CAR models. You can use spdep (`poly2nb`

; `nb2mat`

) or geostan (`shape2mat`

) functions to create a spatial weights matrix. With a weights matrix, you can drop a neighbor by converting the appropriate index from 1 to 0.

The spdep package also has some functions that can help you visualize the connections in that matrix (using spdepâ€™s `nb`

format) to see if there are any missing or otherwise questionable links.

That is, find out the row number for the place of interest (i); then get the row number of the place that is not supposed to be categorized as its neighbor (j). Then convert the corresponding value in the weights matrix `C`

to zero:

`C[i, j] <- 0`

If your spatial data (geometry) is stored as simple features or `SpatialPolygons`

, and say its called `spdf`

, then you can create `C`

with

`C <- geostan::shape2mat(spdf, "B")`

.

See `Examples`

section of the documentation: Intrinsic autoregressive models â€” stan_icar â€˘ geostan

Bayesian Spatial Analysis â€˘ geostan

If you prefer to build a custom Stan model, geostan has some convenience functions for that also (e.g., `prep_icar_data`

and `prep_car_data`

) See: GitHub - ConnorDonegan/Stan-IAR: Spatial intrinsic autoregressive models in Stan

dear @Bob_Carpenter and dear @cmcd

thanks for your time .

my question is like this:

i run bym model , this is my thesis topic.

i want to check the prevalence of a disease in the whole country , but i donâ€™t have the data related to a province . as a result , i have to get the related neighborhoods , so that it does not include that province.

but i donâ€™t know how to do it , that is , how to get all the neighborhoods except that province?

So you have disease incidence data by region (or small areas), and you are missing the disease data for one of those regionâ€”is that right?

I think this is what youâ€™re looking for. The estimates will borrow from neighboring areas, and will also be informed by the total amount of inferred variability in risk. Maybe others can weigh in on this too.

You could add covariates, as in any other model.

If the graph structure youâ€™re using is disconnected, or has islands, then the code will get more complicated.

```
functions {
/** ICAR prior
* following Morris et al. (2019)
* intrinsic autoregressive prior
* @return lpdf of ICAR prior minus any constant terms
*/
real icar_normal_lpdf(vector phi, int N, int[] node1, int[] node2) {
return -0.5 * dot_self(phi[node1] - phi[node2]) +
normal_lpdf(sum(phi) | 0, 0.001 * N);
}
}
data {
int n; // no. areas
int<lower=1> n_edges; // icar stuff
int<lower=1, upper=n> node1[n_edges];
int<lower=1, upper=n> node2[n_edges];
int y[n]; // disease counts
vector[n] log_pop; // log of population at risk
int n_obs; // number of areas with disease data
int obs_idx[n_obs]; // index for observed values of y
}
parameters {
real alpha; // intercept
vector[n] phi_tilde; // icar on standard-normal scale
real<lower=0> spatial_scale; // icar scale
vector[n] theta_tilde; // heterogeneous partial pooling term
real<lower=0> theta_scale; // scale for theta_tilde
}
transformed parameters {
vector[n] log_rate = alpha + phi_tilde * spatial_scale + theta_tilde * theta_scale;
vector[n] eta = log_pop + log_rate;
}
model {
phi_tilde ~ icar_normal(n, node1, node2); // icar prior
theta_tilde ~ std_normal();
spatial_scale ~ std_normal();
theta_scale ~ std_normal();
alpha ~ normal(-5, 5); // this is the prior for the mean log rate
y[obs_idx] ~ poisson_log(eta[obs_idx]); // likelihood for observations
}
```

Here is an example:

```
library(rstan)
library(geostan)
data(georgia)
# compile Stan model
mod <- stan_model("icar.stan")
# create spatial connectivity matrix
C <- shape2mat(georgia, "B")
# create data for icar-normal_lpdf
icar_dl <- prep_icar_data(C)
# add disease data
icar_dl$n <- nrow(georgia)
icar_dl$y <- georgia$deaths.female
## including log of population at risk
icar_dl$log_pop <- log(georgia$pop.at.risk.female)
# which areas do have outcome data?
icar_dl$obs_idx <- which(!is.na(icar_dl$y))
icar_dl$n_obs <- length(icar_dl$obs_idx)
# replace NA values with a placeholder value (these will not be used)
icar_dl$y[-icar_dl$obs_idx] <- 9999
# draw samples from the model
samples <- sampling(mod,
data = icar_dl,
iter = 2500,
cores = 4)
# summarise posterior distributions
eta <- as.matrix(samples, pars = "log_rate")
eta <- exp( eta )
eta.mu <- apply(eta, 2, mean)
eta.ci <- apply(eta, 2, quantile, probs = c(0.025, 0.975))
# compare with crude rates
plot(georgia$rate.female * 10e3, eta.mu * 10e3)
abline(0, 1)
# summary of posterior distribution for places with missing disease data
# cases per 10,000 at risk
cbind( mean = eta.mu[-icar_dl$obs_idx] * 10e3, t( eta.ci[ , -icar_dl$obs_idx] * 10e3 ) )
```

This is no problem at all in a Bayesian model. Itâ€™s perfectly fine to have zero observations in a region. For example, if itâ€™s binomial data, itâ€™s just 0 successes out of 0 observations. Or you can try to get clever and drop the terms from the likelihood, but that shouldnâ€™t be necessary for efficiency.