I’m trying to estimate a two part model similar to that in Zhang et al. JASA '06. We want to estimate random effects for geographic location in each equation in the model, drawing the REs from a bivariate normal distribution that allows them to be correlated across equations. Zhang et al. use a qualitative approximation of the reference prior from Yang and Berger’s '04 Annals paper. I build off code that was working fine (with help from the Stan team, of course!), specifying the reference prior for the var-cov matrix for the REs in the transformed parameters block, also using the Cholesky decomposition that is recommended. The var-cov matrix involves taking the absolute value and the min/max of two parameters, which I haven’t done in Stan and I’m not sure the code that attempts this is correct. I’ve tried coding this different ways but I keep getting complaints about random variables that are constrained to be positive having values that are below zero:
Chain 1 Rejecting initial value:
Chain 1 Error evaluating the log probability at the initial value.
Chain 1 Exception: lognormal_lpdf: Random variable[177] is -0.0618754, but must be nonnegative! (in ‘/var/folders/47/ldkn_90n4pb8y9_5fx21bfl00000gn/T/RtmpYiKyRb/model-aebd76ae0c41.stan’, line 99, column 2 to column 81)
Chain 1 Exception: lognormal_lpdf: Random variable[177] is -0.0618754, but must be nonnegative! (in ‘/var/folders/47/ldkn_90n4pb8y9_5fx21bfl00000gn/T/RtmpYiKyRb/model-aebd76ae0c41.stan’, line 99, column 2 to column 81)
In the latest version of the code, the complaint seems to be about sigma, the standard deviation of the lognormal part of the likelihood for non-zero outcomes. This seems strange to me, since sigma doesn’t have anything to do with the modifications to the (previously working) code to do the reference prior. In other attempts to code the var-cov matrix for the prior, I was seeing the same “out of bounds” complaints for parameters relevant for that prior. I’ve tried using both rstan and cmdstanr and I’m seeing the same problems. The code is below. Since data is proprietary and complicated I haven’t provided it. Thanks for any help!
// this file includes circuit fixed effects, office random effects,
// and follows the approach of Zhang et al. JASA '06
data {
int<lower=0> K0; // number of dummy predictors for l = 1
int<lower=0> KC0; // number of continuous predictors for l = 1
int<lower=0> KN0; // number of dummy predictors for l = 2
int<lower=0> KCN0; // number of continuous predictors for l = 2
int<lower=1> N[2]; // number of observations where l = 1 and l = 2 respectively
int<lower=0, upper=1> l[sum(N)]; // 0-1 outcome
vector[N[2]] y; // continous outcome | l = 2
matrix[N[1],K0] X0; // predictor matrix, dummies | l = 1
matrix[N[2],KN0] X1; // predictor matrix, dummies | l = 2
matrix[N[1],KC0] X0c; // predictor matrix, continuous | l = 1
matrix[N[2],KCN0] X1c; // predictor matrix, continuous | l = 2
int<lower=0> O; // number of offices
int<lower=1,upper=O> oid0[N[1]]; // wheel id | l = 1
int<lower=1,upper=O> oid1[N[2]]; // wheel id | l = 2
}
parameters {
vector[2] a_raw[O]; // raw office random effects
real alpha0; //intercept
real alpha1; //intercept
vector[K0] beta; // coeffs for binary, dummies
vector[KN0] gam; // coeffs for lognormal, dummies
vector[KC0] betac; // coeffs for binary, continuous vars
vector[KCN0] gamc; // coeffs for lognormal,continuous vars
real<lower=.000000001> sigma; // standard deviation
vector<lower=0>[2] d_a;
real<lower=-1,upper=1> g_a; // gamma from JASA paper; don't need to specify prior since default is uniform
}
transformed parameters {
vector[O] a1; // office random effect for binary
vector[O] a2; // office random effect for lognormal
real<lower=0> delta_a;
matrix[2,2] V_a; // var-cov matrix for office REs
matrix[2,2] L_a; // Cholesky factor
matrix[2,2] L_b; // Cholesky factor
delta_a = fabs(d_a[1] - d_a[2]);
V_a[1,1] = max(d_a) - (delta_a * g_a * g_a);
V_a[2,2] = min(d_a) + (delta_a * g_a * g_a);
V_a[1,2] = (-1) * delta_a * g_a * sqrt(1 - g_a * g_a);
V_a[2,1] = V_a[1,2];
// Cholesky decomp for office random effects
L_a[1,1] = sqrt(V_a[1,1]);
L_a[1,2] = 0;
L_a[2,1] = V_a[1,2]/sqrt(V_a[1,1]);
L_a[2,2] = sqrt(1 - square(V_a[1,2]/(V_a[1,1]*V_a[2,2]))) * V_a[2,2];
for (o in 1:O) {
vector[2] a;
a = L_a * a_raw[o];
a1[o] = a[1];
a2[o] = a[2];
}
}
model {
// priors
for (o in 1:O) a_raw[o] ~ normal(0,1);
alpha0 ~ normal(0,1e-6);
alpha1 ~ normal(0,1e-6);
d_a[1] ~ gamma(.5,100);
d_a[2] ~ gamma(.5,100);
sigma ~ inv_gamma(1,1000);
to_vector(beta) ~ normal(0,1e-6); // probit, dummies
to_vector(betac) ~ normal(0,1e-6); // probit, continuous
to_vector(gam) ~ normal(0,1e-6); // ln eq, dummies
to_vector(gamc) ~ normal(0,1e-6); // ln eq, continuous
// likelihood
target += normal_lccdf(alpha0 + X0 * beta + X0c * betac + a1[oid0] | 0, 1); // probit for l = 1
target += normal_lcdf(alpha0 + X1 * beta + X1c * betac + a1[oid1] | 0, 1); // probit for l = 2
target += lognormal_lpdf(y | alpha1 + X1 * gam + X1c * gamc + a2[oid1], sigma); // y | l = 2
}