Dear Stanners,
I’m trying to specify and test a simple beta model prior to modelling my real data. The (simulated) data describes how transition to psychosis depends on age (younger adults are more likely to transition, however there is also higher variance at this age). The challenge is the data doesn’t contain transition rates (\psi), only outcomes (t = 0, 1). So my model looks like:
t_i \sim Bernoulli(\psi_i)
\psi_i \sim Beta(p_i, q_i)
Where location and precision parameters of Beta are a function of age:
\mu_i = \beta_1age
\phi_i = \beta_2age
The shape parameters with weakly informative priors are:
p_i = \mu_i \phi_i + 2
q_i = \phi_i - \mu_i \phi_i + 2
I think, but I’m not sure, that this might be a latent variable problem. Or it may be more akin a measurement error problem…? Part of my difficulty is that I’m not even sure how to describe my problem so I don’t know the search terms or manual pages to focus on.
Anyway, the simulated data I’m trying to model is generated by the R code below:
library("rstan")
inv.logit <- function(x) {
1/(1 + exp(-x))
}
set.seed(1)
N <- 100
age <- round(rnorm(N, 30, 10))
b1 <- -0.5
b2 <- 0.05
mu <- inv.logit(b1*age)
phi <- exp(b2*age)
p <- mu * phi + 2
q <- phi - mu * phi + 2
psi <- rbeta(N, p, q)
t <- rbinom(N, size = 1, prob = psi)
And the Stan model I wrote to get the posterior distribution of psi and age effects is here:
data {
int<lower=0> N; // number of Subjects
int t[N]; // [0, 1] transition outcome
real age[N]; // predictor of psi
}
parameters {
vector[2] coef; // psi regression coefficients
vector<lower=0, upper=1>[N] psi;
}
transformed parameters {
vector<lower=0, upper=1>[N] mu; // location parameter for Beta
vector<lower=0>[N] phi; // precision parameter for Beta
vector<lower=1>[N] p; // shape parameters for Beta
vector<lower=1>[N] q;
for (i in 1:N) {
mu[i] = inv_logit(coef[1]*age[i]);
phi[i] = exp(coef[2]*age[i]);
p[i] = mu[i] * phi[i] + 2;
q[i] = phi[i] - mu[i] * phi[i] + 2;
}
}
model {
coef ~ normal(0, 0.5); // priors
psi ~ beta(p, q); // latent variable
t ~ bernoulli(psi); // model
}
However when I try to estimate the model with the code below, I get the subsequent error code:
data_list <- list(
N = N,
age = age,
t = t)
m <- stan(model_code=beta_model, data=data_list)
SAMPLING FOR MODEL '5fabd79078f641126f00eee57af2939c' NOW (CHAIN 1).
Gradient evaluation took 7.9e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.79 seconds.
Adjust your expectations accordingly!
Iteration: 1 / 2000 [ 0%] (Warmup)
[1] "Error in sampler$call_sampler(args_list[[i]]) : "
[2] " c++ exception (unknown reason)"
error occurred during calling the sampler; sampling not done
I’ve tried widening and narrowing the priors on the coefs and shape parameters, adding more coefs (e.g., intercepts), but nothing I’ve done gets past this error. I think it’s something to do with how I’m specifying psi, since psi is an intermediate parameter between coef
and t
. I’m also wondering whether it is numerical underflow (overflow?), since I’m not using logs where I probably should be. Any help or guidance you can provide would be much appreciated to this motivated but mathematically challenged Stan learner.
Rich