Hi everyone,
I’m having difficulty achieving convergence on my parameters of interest when I use group-level priors (but not when I hard code them).
The core of the model is a spatial model, where the outcome is a count variable which is modelled as a function of an individual-level intercept \alpha_i, a target-level intercept \gamma_j, and the distance between the two (latent) parameters of interest, \theta_{ig} and \zeta_j:
\text{Pr}(Y_{ijg} = y) \sim \text{NegBin}(\alpha_i + \gamma_j - ||\theta_{ig} - \zeta_j||^2, \omega_i).
The prior on \zeta_j is simply \text{N}(0, 1). However, because individuals i belong to groups g, I put separate group-specific priors on them: \theta_{ig} \sim \text{N}(\mu_g, \sigma_g). However, the \theta_{ig}, \mu_{g}, and \sigma_{g} don’t converge. The estimates get somewhat close to each other across multiple chains, but nevertheless end up in clearly distinct locations. On multiple runs, some chains might center \mu_{g} at, say, 0.77, and others at, say, 0.65, as if there are two modes. When I drop the group-level priors, and just make all \theta_{ig} \sim \text{N}(0, 1), the theta and zeta converge as expected. Any help would be greatly appreciated! The Stan code is the following:
data {
int verbose;
int N;
int M;
int G;
int group[N];
int Y[N, M];
}
parameters {
// User-level intercept
vector[N] alpha;
real alpha_mu;
real<lower = 0> alpha_sigma;
// Target-level intercept
vector[M] gamma;
real<lower = 0> gamma_sigma;
// User-level variance
vector<lower = 0>[N] omega;
real<lower = 0> omega_a;
real<lower = 0> omega_b;
// Latent spatial position of user
vector[G] theta_mu;
vector<lower = 0>[G] theta_sigma;
vector[N] theta;
// Latent spatial position of target
vector[M] zeta;
}
transformed parameters {
}
model {
// Linear predictor
real C[N, M];
// Priors
alpha_mu ~ normal(0, 2.5);
alpha_sigma ~ normal(0, 2.5);
alpha ~ normal(alpha_mu, alpha_sigma);
gamma_sigma ~ normal(0, 2.5);
gamma ~ normal(0, gamma_sigma);
omega ~ inv_gamma(omega_a, omega_b);
theta_mu ~ normal(0, 1)
theta_sigma ~ normal(0, 1)
theta ~ normal(theta_mu[group], theta_sigma[group]);
zeta ~ normal(0, 1);
// Model
for(i in 1:N)
for(j in 1:M)
C[i, j] = alpha[i] + gamma[j] - square(theta[i] - zeta[j]);
for(i in 1:N) Y[i, ] ~ neg_binomial_2_log(C[i, ], omega[i]);
}
generated quantities {
}