Hi everyone,
I am fitting a reasonably standard Fay-Herriot style Beta regression model with rstan in R. As some of you may know the mean-precision parameterization of the Beta distribution gives
where \theta_i and \psi_i are known direct proportion estimators (bounded between 0 and 1) and their respective sampling variances (bounded between 0 and 0.25). Also note the following relationship
The model fits well in Stan, but unfortunately, I cannot find a way to reduce the number of divergent iterations. About 25% of my iterations diverge even though the Rhat’s are very good (i.e. < 1.001). Upon inspection of the pairs() plots, we see that the parameters specified in the transformed parameters block are (by design) exact transformations of other parameters. I am worried that these transformations are causing Stan to diverge, because of the perfect correlation between parameters.
A second problem is the initial values. Unless specifying init = 0, the Stan model cannot find a suitable starting value and does not even start sampling. The error it gives it related to unallowed values of \alpha_i.
I have pasted the Stan model below and here (forStanForum.zip - Google Drive) you can find a .zip with the stan code and two .rds files with the data and R stan object for those wishing to explore the divergent iterations in more detail.
functions {
// function to transform psi to unconstrained space
vector L(vector psi, int n){
vector[n] temp = psi ./ (0.25 - psi);
return log(temp);
}
// inverse of L()
vector invL(vector eta, int n){
vector[n] temp1 = 0.25 * exp(eta);
vector[n] temp2 = 1 + exp(eta);
return temp1 ./ temp2;
}
}
data {
int<lower=0> m; // number of sampled areas
int<lower=0> M; // total number of areas
int<lower=0> nu_stable_psi; // number of sampled areas with stable psi
vector[M] logNi; // log population size in each area
int<lower=0> q_c; // number of parameters in BETA MODEL
matrix[M, q_c] x_c; // covariates for BETA MODEL
vector<lower=0,upper=1>[m] theta_o; // observed theta
vector<lower=0, upper=0.25>[nu_stable_psi] psi_o; // observed psi
}
transformed data{
vector[nu_stable_psi] Lpsi_o = L(psi_o, nu_stable_psi); // unconstrained observed psi's
}
parameters {
vector[q_c] B_c; // fixed parameters
vector[M] v; // random effect
real<lower=0> sigma_v;
real z_0; // fixed parameters for PSI MODEL
real<upper=0> z_1; // fixed parameters for PSI MODEL
real<lower=0> sigma_psi; // error for PSI MODEL
vector[M-nu_stable_psi] Lpsi_obar; // unconstrained variances for the areas with unstable variance estimates
}
transformed parameters{
// DECLARACTIONS
// length M
vector[M] eta_c; // linear pred for BETA MODEL
vector<lower=0, upper=1>[M] mu; // mean for BETA MODEL
vector[M] mu_psi; // mean of PSI MODEL
vector<lower=0,upper=1>[M] theta; // full vector of theta
vector<lower=0, upper=0.25>[M] psi; // full vector of psi
vector<lower=0>[M] phi; // full vector of phi
vector<lower=0>[M] alpha; // Beta shape parameter 1
// length: various
vector<lower=0,upper=1>[M-m] theta_obar; // missing theta's
// DEFINITIONS
// length M
// linear predictor for BETA MODEL
eta_c = x_c*B_c + v*sigma_v;
mu = inv_logit(eta_c);
// length: various
// impute missing theta's
for(j in 1:M-m){
theta_obar[j] = mu[j+m];
}
// PSI MODEL
mu_psi = z_0 + z_1 * logNi;
// create FULL VECTORS
theta = append_row(theta_o, theta_obar);
psi = invL(append_row(Lpsi_o, Lpsi_obar), M);
phi = ((mu .* (1 - mu)) ./ (psi)) - 1;
alpha = mu .* phi;
}
model{
// PRIORS
// for BETA MODEL
for(j in 1:q_c){
B_c[j] ~ normal(0,3);
}
v ~ std_normal();
sigma_v ~ cauchy(0,2);
// for PHI MODEL
z_0 ~ normal(0,3);
z_1 ~ normal(0,3);
sigma_psi ~ cauchy(0,2);
// PHI MODEL
for(k in 1:nu_stable_psi){
target += normal_lpdf(Lpsi_o[k] | mu_psi[k], sigma_psi);
}
// MODEL
// draw from prior for non-sampled psi's
for(k in nu_stable_psi+1:M){
target += normal_lpdf(Lpsi_obar[k-nu_stable_psi] | mu_psi[k], sigma_psi);
}
// Likelihood for complete theta's and phi's
target += beta_lpdf(theta | alpha, phi - alpha);
}
generated quantities{
real log_lik_beta[nu_stable_psi];
vector[M] theta_rep;
// log likelihood values for BETA MODEL
for(f in 1:nu_stable_psi){
log_lik_beta[f] = beta_lpdf(theta[f] | alpha[f], phi[f] - alpha[f]);
}
// draw from predictive distributions
for(j in 1:M){
theta_rep[j] = beta_rng(alpha[j], phi[j] - alpha[j]);
}
}
I have tried declaring the vectors \alpha and \phi in the model block, but this didn’t help. Initially I was using the beta_proportion_lpdf
function, but opted for beta_lpdf
as I believe this setup makes it clearer what is really happening for the Beta model.
I have a sneeky suspicion that inference from this model should be valid and that Stan is only complaining because of the perfectly correlated parameters that are calculated in the transformed parameters block. I would be interested in others thoughts and particular how I might reparametrize the model to reduce the divergence.
BETA2_stan.stan (3.2 KB)
Thank you in advance for any assistance.
EDIT: Here is an example of the correlation between mu and phi. This is to be expected (seeing as we told Stan to do this).