data {
// outcome information
int<lower = 0> n_out; // number of outcome observations
int<lower = 0> y[n_out]; // outcome observations
vector[n_out] log_off; // offset in count regression; pass as log-transformed
// exposure information
int<lower = 0> n_expose; // number of exposure locations
vector[n_expose] weight_expose; // exposure weight at each exposure location
int<lower = 0> n_target; // number of target locations to calculate exposure for
int<lower = 0> n_zip; // number of zipcodes; in unstratified case should equal n_out
int<lower = 1, upper = n_zip> zip_target[n_target]; // target zip to aggregate to
int<lower = 0> zip_target_obs[n_zip]; // number of targets to aggregate in each zipcode
vector[n_target] weight_target; // population weight at each target location for aggregation
int<lower = 0> n_w; // number of non-zero elements of distance matrix
int<lower = 0> n_u; // number row-start indicators in sparse distmat
vector[n_w] w; // non-zero elements of distmat
int<lower = 0> v[n_w]; // column indices of non-zero distmat elements
int<lower = 0> u[n_u]; // indicator of where in w a given row's non-zero values start
}
parameters {
real b0; // intercept
real b1; // coefficient on exposure metric
real log_ar; // spatial range parameter (log-scale for sampling)
}
transformed parameters{
real<lower = 0> ar = exp(log_ar); // spatial range parameter
}
model {
vector[n_w] w_new = (-3 * w) / ar; // scale distmat by spatial range (just positives vector)
// sum of distance-decay exposures at each target loc
vector[n_target] x_target = csr_matrix_times_vector(n_target, n_expose,
w_new, v, u,
weight_expose);
vector[n_target] x_target_w = x_target .* weight_target; // weight by population fraction
// aggregate pop-weighted blocks by zip code
vector[n_zip] x;
for (i in 1:n_zip) {
int J = zip_target_obs[i]; // number of blocks in zipcode i
vector[J] x_temp = x_target_w[zip_target[i]];
x[i] = sum(x_temp);
}
vector[n_zip] log_x = log10(x);
vector[n_zip] x_std = (log_x - mean(log_x)) / sd(log_x);
// poisson likelihood
y ~ poisson_log(log_off + b0 + beta * x_std);
// priors
b0 ~ normal(0, 5);
b1 ~ normal(0, 2.5);
log_ar ~ normal(8, 1.5) // can't center at 0, needs to be positive,
// also shouldn't be above log(30000) = 10.3
// because that's the maximum range we looked for farms
}
I have disease counts for US 5-digit zip codes but need to calculate an exposure variable from point locations. I would like to calculate exposures at finer spatial resolution (US Census Block centroids) and take the population-weighted mean to obtain a value of the exposure variable for each zip code. The plan is to use the population-weighted mean zip code exposure as a predictor in a Poisson model of the disease outcome. I want to allow the exposure to decay exponentially with distance, with the rate controlled by the spatial range parameter \alpha_{r}, which needs to be estimated. As such, I need to calculate the exposure for each block centroid and aggregate those values to zip codes with each model iteration. I’m sure there are numerous issues to consider with this approach, but at the moment I’m stuck on the fairly basic issue of summarizing a vector by groups. Essentially I am trying to do the stan equivalent of a dplyr::group_by() %>% dplyr::summarise()
. I have attempted to do this using index variables and a for loop:
vector[n_target] x_target_w = x_target .* weight_target; // weight by population fraction
// aggregate pop-weighted blocks by zip code
vector[n_zip] x;
for (i in 1:n_zip) {
int J = zip_target_obs[i]; // number of blocks in zipcode i
vector[J] x_temp = x_target_w[zip_target[i]];
x[i] = sum(x_temp);
}
But I am getting parsing errors about variable definition and dimension mismatches rstan::stan_model
:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Variable definition base type mismatch, variable declared as base type vector variable definition has base type realVariable definition dimensions mismatch, definition specifies 1, declaration specifies 0 error in 'modelba6e55e7cb78_spatial_range' at line 56, column 48
-------------------------------------------------
54: for (i in 1:n_zip) {
55: int J = zip_target_obs[i]; // number of blocks in zipcode i
56: vector[J] x_temp = x_target_w[zip_target[i]];
^
57: x[i] = sum(x_temp);
-------------------------------------------------
I suspect I am doing something silly and there is a simple solution, but I have not yet been able to work it out myself. Thanks!