Problems estimating two part model with random effects correlated across parts

I’m trying to estimate a two part model similar to that in Zhang et al. JASA '06. We want to estimate random effects for geographic location in each equation in the model, drawing the REs from a bivariate normal distribution that allows them to be correlated across equations. Zhang et al. use a qualitative approximation of the reference prior from Yang and Berger’s '04 Annals paper. I build off code that was working fine (with help from the Stan team, of course!), specifying the reference prior for the var-cov matrix for the REs in the transformed parameters block, also using the Cholesky decomposition that is recommended. The var-cov matrix involves taking the absolute value and the min/max of two parameters, which I haven’t done in Stan and I’m not sure the code that attempts this is correct. I’ve tried coding this different ways but I keep getting complaints about random variables that are constrained to be positive having values that are below zero:

Chain 1 Rejecting initial value:
Chain 1 Error evaluating the log probability at the initial value.
Chain 1 Exception: lognormal_lpdf: Random variable[177] is -0.0618754, but must be nonnegative! (in ‘/var/folders/47/ldkn_90n4pb8y9_5fx21bfl00000gn/T/RtmpYiKyRb/model-aebd76ae0c41.stan’, line 99, column 2 to column 81)
Chain 1 Exception: lognormal_lpdf: Random variable[177] is -0.0618754, but must be nonnegative! (in ‘/var/folders/47/ldkn_90n4pb8y9_5fx21bfl00000gn/T/RtmpYiKyRb/model-aebd76ae0c41.stan’, line 99, column 2 to column 81)

In the latest version of the code, the complaint seems to be about sigma, the standard deviation of the lognormal part of the likelihood for non-zero outcomes. This seems strange to me, since sigma doesn’t have anything to do with the modifications to the (previously working) code to do the reference prior. In other attempts to code the var-cov matrix for the prior, I was seeing the same “out of bounds” complaints for parameters relevant for that prior. I’ve tried using both rstan and cmdstanr and I’m seeing the same problems. The code is below. Since data is proprietary and complicated I haven’t provided it. Thanks for any help!

// this file includes circuit fixed effects, office random effects,
// and follows the approach of Zhang et al. JASA '06

data {
    int<lower=0> K0;      // number of dummy predictors for l = 1 
    int<lower=0> KC0;      // number of continuous predictors for l = 1
    int<lower=0> KN0;      // number of dummy predictors for l = 2
    int<lower=0> KCN0;      // number of continuous predictors for l = 2
    int<lower=1> N[2];   // number of observations where l = 1 and l = 2 respectively
    int<lower=0, upper=1> l[sum(N)]; // 0-1 outcome
    vector[N[2]] y;                  // continous outcome | l = 2
    matrix[N[1],K0] X0;   //  predictor matrix, dummies | l = 1
    matrix[N[2],KN0] X1;   //  predictor matrix, dummies | l = 2
    matrix[N[1],KC0] X0c;   //  predictor matrix, continuous | l = 1
    matrix[N[2],KCN0] X1c;   //  predictor matrix, continuous | l = 2
    int<lower=0> O;      // number of offices
    int<lower=1,upper=O> oid0[N[1]]; // wheel id | l = 1
    int<lower=1,upper=O> oid1[N[2]]; // wheel id | l = 2

parameters {
    vector[2] a_raw[O]; // raw office random effects
    real alpha0; //intercept
    real alpha1; //intercept
    vector[K0] beta;  // coeffs for binary, dummies 
    vector[KN0] gam; // coeffs for lognormal, dummies
    vector[KC0] betac;  // coeffs for binary, continuous vars
    vector[KCN0] gamc; // coeffs for lognormal,continuous vars
    real<lower=.000000001> sigma; // standard deviation 
    vector<lower=0>[2] d_a;
    real<lower=-1,upper=1> g_a;	// gamma from JASA paper; don't need to specify prior since default is uniform
transformed parameters {

   vector[O] a1;         // office random effect for binary
   vector[O] a2;         // office random effect for lognormal
   real<lower=0> delta_a;
   matrix[2,2] V_a;     // var-cov matrix for office REs
   matrix[2,2] L_a;     // Cholesky factor
   matrix[2,2] L_b;     // Cholesky factor

     delta_a = fabs(d_a[1] - d_a[2]);

     V_a[1,1] = max(d_a) - (delta_a * g_a * g_a);
     V_a[2,2] = min(d_a) + (delta_a * g_a * g_a);

     V_a[1,2] = (-1) * delta_a * g_a * sqrt(1 - g_a * g_a);

     V_a[2,1] = V_a[1,2];

// Cholesky decomp for office random effects

      L_a[1,1] = sqrt(V_a[1,1]);
      L_a[1,2] = 0;
      L_a[2,1] = V_a[1,2]/sqrt(V_a[1,1]);
      L_a[2,2] = sqrt(1 - square(V_a[1,2]/(V_a[1,1]*V_a[2,2]))) * V_a[2,2];
      for (o in 1:O) {
        vector[2] a;
        a = L_a * a_raw[o]; 
        a1[o] = a[1];
        a2[o] = a[2];

model {

// priors

    for (o in 1:O) a_raw[o] ~ normal(0,1);

    alpha0 ~ normal(0,1e-6);
    alpha1 ~ normal(0,1e-6);

    d_a[1] ~ gamma(.5,100);
    d_a[2] ~ gamma(.5,100);

    sigma ~ inv_gamma(1,1000);

    to_vector(beta) ~ normal(0,1e-6); // probit, dummies
    to_vector(betac) ~ normal(0,1e-6); // probit, continuous

    to_vector(gam) ~ normal(0,1e-6); // ln eq, dummies
    to_vector(gamc) ~ normal(0,1e-6); // ln eq, continuous

// likelihood

  target += normal_lccdf(alpha0 + X0 * beta + X0c * betac + a1[oid0] | 0, 1); // probit for l = 1
  target += normal_lcdf(alpha0 + X1 * beta + X1c * betac + a1[oid1] | 0, 1); // probit for l = 2
  target += lognormal_lpdf(y | alpha1 + X1 * gam + X1c * gamc + a2[oid1], sigma); // y | l = 2


You can see that the error is arising on line 99 evaluating lognormal_lpdf. There’s only one lognormal and it’s saying that 177 of the y vector is negative:

  vector[N[2]] y;                  // continous outcome | l = 2
  target += lognormal_lpdf(y | alpha1 + X1 * gam + X1c * gamc + a2[oid1], sigma); // y | l = 2

In general, when a variable is required to be positive, for instance to be in support for lognormal, then you should declare with a lower=0 constraint. Then you would’ve caught that error on data load.

I would also recommend getting rid of the lower bound here:

    real<lower=.000000001> sigma; // standard deviation 

That will simply suppress underflow errors due to the unconstrained value (log(sigma)) getting too small. The reason you want to see those is that they indicate the sampler wants to set sigma very close to zero. A better approach to keep things away from zero is to include a prior. For sigma, you’re taking this prior

    sigma ~ inv_gamma(1,1000);

which does not even have a well defined mean or variance. This can be particularly problematic if you’re building a hierarchical model. Are you sure you’re using the right parameterization of inverse gamma (shape/scale) and this is what you intend for the prior?

1 Like

Thanks very much for taking a look at the code and catching my silly error on invalid y values for the lognormal.

I have taken out the lower bound as you’ve suggested.

Thanks for pointing out the potential problems with gamma prior. This is what Zhang et. al say they used and we are trying to replicate their analysis. Parameterizing the model in terms of the precision and taking a gamma(1,.001) may avoid the problem. That appears to be what Zhang et. al actually implemented in WinBUGS.