I have a model as described over here that is a sort of measurement error model when the predictor is a proportion. A simple sim and code is below. Basically true_proportion
is a vector of parameters, that are learned from a beta_proportion
model where the outcome is an observed_proportion
and kappa is observed_count + 1
(the smaller kappa is, the more uncertainty in true_proportion
. true_proportion
is then used elsewhere in another model (in this case a logistic regression) as the predictor for my outcome of interest. However, I would like to have a varying intercept for group in the beta_proportion
part of the model, because in my actual application, I have repeated observed_proportion
for each group, and these should be correlated and inform the true_proportion
parameters, especially as observed_count
varies greatly among observations within groups.
Here is the code for the simulated data set up and simple model.
library(rstan)
#define true proportions and generate observed proportions from the true, using beta distribution scaled by the observed_count
n <- 100
observed_count <- rnbinom(n, mu=50, size=1) + 10
true_proportion <- runif(n, 0.0000001, 0.9999999)
mu <- true_proportion
shape1 <- mu * (observed_count + 1)
shape2 <- (1 - mu) * (observed_count + 1)
observed_proportion <- rbeta(n, shape1, shape2)
observed_proportion <- ifelse(observed_proportion==0, (observed_proportion + 0.0001), observed_proportion)
observed_proportion <- ifelse(observed_proportion==1, (observed_proportion - 0.0001), observed_proportion)
#generate outcome variable with the true proportion as the predictor
a_b <- 0.01
b_b <- 0.7
y_p <- a_b + true_proportion*b_b
y <- rbinom(n, 1, y_p)
#data
d1 <- list(N = n,
y = y,
observed_proportion = observed_proportion,
observed_count = observed_count)
#stan model
stan_code <- "
data {
int<lower=0> N;
vector[N] observed_count;
vector[N] observed_proportion;
int <lower=0, upper=1> y [N];
}
parameters {
vector<lower=0, upper=1>[N] true_proportion;
real alpha;
real beta;
}
model {
observed_proportion ~ beta_proportion(true_proportion, observed_count + 1);
y ~ bernoulli_logit(alpha + beta * true_proportion);
alpha ~ normal(0, 5);
beta ~ normal(0, 5);
}
"
#fit model
fit1 <- stan(model_code = stan_code, data = d1,
chains = 4, iter = 2000, warmup = 1000,
thin = 1, cores = 4)
print(fit1)
It seems to work well and samples well.
Now to add the varying intercept. My initial thought for adding the varying intercept was the following:
#add group to the data
grp <- rep(1:10, each=10)
d2 <- list(N = n,
y = y,
observed_proportion = observed_proportion,
observed_count = observed_count,
grp = grp,
N_grp = max(grp))
#stan model
stan_code2 <- "
data {
int<lower=0> N;
int<lower=1> N_grp;
vector[N] observed_count;
vector[N] observed_proportion;
int <lower=0, upper=1> y [N];
int <lower=1, upper=N_grp> grp[N];
}
parameters {
vector<lower=0, upper=1>[N] true_proportion;
vector<lower=0, upper=1>[N_grp] u;
real<lower=0> tau_u;
real alpha;
real beta;
}
model {
tau_u ~ normal(0, 0.2);
u ~ normal(0, tau_u);
observed_proportion ~ beta_proportion(true_proportion + u[grp], observed_count + 1);
y ~ bernoulli_logit(alpha + beta * (true_proportion + u[grp]));
alpha ~ normal(0, 5);
beta ~ normal(0, 5);
}
"
#fit model
fit2 <- stan(model_code = stan_code2, data = d2,
chains = 4, iter = 2000, warmup = 1000,
thin = 1, cores = 4)
print(fit2)
However, this doesn’t sample. For all the chains I get “does not contain samples”. I am wondering if this is because while true_proportion
has the proper constraints, true_proportion + u[grp]
does not. I tried to work around this, but my workarounds have all failed.
I am sure this is pretty simple, but I am fairly new to writing Stan code myself. Can anyone help me add a varying intercept to this model? Why won’t it sample as coded? Thanks!