Problems estimating two part model with random effects correlated across parts

gwawro · July 21, 2022, 4:12pm

I’m trying to estimate a two part model similar to that in Zhang et al. JASA '06. We want to estimate random effects for geographic location in each equation in the model, drawing the REs from a bivariate normal distribution that allows them to be correlated across equations. Zhang et al. use a qualitative approximation of the reference prior from Yang and Berger’s '04 Annals paper. I build off code that was working fine (with help from the Stan team, of course!), specifying the reference prior for the var-cov matrix for the REs in the transformed parameters block, also using the Cholesky decomposition that is recommended. The var-cov matrix involves taking the absolute value and the min/max of two parameters, which I haven’t done in Stan and I’m not sure the code that attempts this is correct. I’ve tried coding this different ways but I keep getting complaints about random variables that are constrained to be positive having values that are below zero:

Chain 1 Rejecting initial value:
Chain 1 Error evaluating the log probability at the initial value.
Chain 1 Exception: lognormal_lpdf: Random variable[177] is -0.0618754, but must be nonnegative! (in ‘/var/folders/47/ldkn_90n4pb8y9_5fx21bfl00000gn/T/RtmpYiKyRb/model-aebd76ae0c41.stan’, line 99, column 2 to column 81)
Chain 1 Exception: lognormal_lpdf: Random variable[177] is -0.0618754, but must be nonnegative! (in ‘/var/folders/47/ldkn_90n4pb8y9_5fx21bfl00000gn/T/RtmpYiKyRb/model-aebd76ae0c41.stan’, line 99, column 2 to column 81)

In the latest version of the code, the complaint seems to be about sigma, the standard deviation of the lognormal part of the likelihood for non-zero outcomes. This seems strange to me, since sigma doesn’t have anything to do with the modifications to the (previously working) code to do the reference prior. In other attempts to code the var-cov matrix for the prior, I was seeing the same “out of bounds” complaints for parameters relevant for that prior. I’ve tried using both rstan and cmdstanr and I’m seeing the same problems. The code is below. Since data is proprietary and complicated I haven’t provided it. Thanks for any help!

// this file includes circuit fixed effects, office random effects,
// and follows the approach of Zhang et al. JASA '06

data {
    int<lower=0> K0;      // number of dummy predictors for l = 1 
    int<lower=0> KC0;      // number of continuous predictors for l = 1
    int<lower=0> KN0;      // number of dummy predictors for l = 2
    int<lower=0> KCN0;      // number of continuous predictors for l = 2
    int<lower=1> N[2];   // number of observations where l = 1 and l = 2 respectively
    int<lower=0, upper=1> l[sum(N)]; // 0-1 outcome
    vector[N[2]] y;                  // continous outcome | l = 2
    
    matrix[N[1],K0] X0;   //  predictor matrix, dummies | l = 1
    matrix[N[2],KN0] X1;   //  predictor matrix, dummies | l = 2
    
    matrix[N[1],KC0] X0c;   //  predictor matrix, continuous | l = 1
    matrix[N[2],KCN0] X1c;   //  predictor matrix, continuous | l = 2
    
    int<lower=0> O;      // number of offices
    int<lower=1,upper=O> oid0[N[1]]; // wheel id | l = 1
    int<lower=1,upper=O> oid1[N[2]]; // wheel id | l = 2
}

parameters {
    vector[2] a_raw[O]; // raw office random effects
    real alpha0; //intercept
    real alpha1; //intercept
    vector[K0] beta;  // coeffs for binary, dummies 
    vector[KN0] gam; // coeffs for lognormal, dummies
    vector[KC0] betac;  // coeffs for binary, continuous vars
    vector[KCN0] gamc; // coeffs for lognormal,continuous vars
    real<lower=.000000001> sigma; // standard deviation 
    vector<lower=0>[2] d_a;
    real<lower=-1,upper=1> g_a;	// gamma from JASA paper; don't need to specify prior since default is uniform
}
    
transformed parameters {

   vector[O] a1;         // office random effect for binary
   vector[O] a2;         // office random effect for lognormal
   real<lower=0> delta_a;
   matrix[2,2] V_a;     // var-cov matrix for office REs
   matrix[2,2] L_a;     // Cholesky factor
   matrix[2,2] L_b;     // Cholesky factor

     delta_a = fabs(d_a[1] - d_a[2]);

     V_a[1,1] = max(d_a) - (delta_a * g_a * g_a);
     
     V_a[2,2] = min(d_a) + (delta_a * g_a * g_a);

     V_a[1,2] = (-1) * delta_a * g_a * sqrt(1 - g_a * g_a);

     V_a[2,1] = V_a[1,2];

// Cholesky decomp for office random effects

      L_a[1,1] = sqrt(V_a[1,1]);
      L_a[1,2] = 0;
      L_a[2,1] = V_a[1,2]/sqrt(V_a[1,1]);
      L_a[2,2] = sqrt(1 - square(V_a[1,2]/(V_a[1,1]*V_a[2,2]))) * V_a[2,2];
      for (o in 1:O) {
        vector[2] a;
        a = L_a * a_raw[o]; 
        a1[o] = a[1];
        a2[o] = a[2];
	}
}

model {

// priors

    for (o in 1:O) a_raw[o] ~ normal(0,1);

    alpha0 ~ normal(0,1e-6);
    alpha1 ~ normal(0,1e-6);

    d_a[1] ~ gamma(.5,100);
    d_a[2] ~ gamma(.5,100);

    sigma ~ inv_gamma(1,1000);

    to_vector(beta) ~ normal(0,1e-6); // probit, dummies
    to_vector(betac) ~ normal(0,1e-6); // probit, continuous

    to_vector(gam) ~ normal(0,1e-6); // ln eq, dummies
    to_vector(gamc) ~ normal(0,1e-6); // ln eq, continuous

// likelihood

  target += normal_lccdf(alpha0 + X0 * beta + X0c * betac + a1[oid0] | 0, 1); // probit for l = 1
  target += normal_lcdf(alpha0 + X1 * beta + X1c * betac + a1[oid1] | 0, 1); // probit for l = 2
  target += lognormal_lpdf(y | alpha1 + X1 * gam + X1c * gamc + a2[oid1], sigma); // y | l = 2

}

Bob_Carpenter · July 21, 2022, 4:56pm

You can see that the error is arising on line 99 evaluating lognormal_lpdf. There’s only one lognormal and it’s saying that 177 of the y vector is negative:

  vector[N[2]] y;                  // continous outcome | l = 2
  ...
  target += lognormal_lpdf(y | alpha1 + X1 * gam + X1c * gamc + a2[oid1], sigma); // y | l = 2

In general, when a variable is required to be positive, for instance to be in support for lognormal, then you should declare with a lower=0 constraint. Then you would’ve caught that error on data load.

I would also recommend getting rid of the lower bound here:

    real<lower=.000000001> sigma; // standard deviation

That will simply suppress underflow errors due to the unconstrained value (log(sigma)) getting too small. The reason you want to see those is that they indicate the sampler wants to set sigma very close to zero. A better approach to keep things away from zero is to include a prior. For sigma, you’re taking this prior

    sigma ~ inv_gamma(1,1000);

which does not even have a well defined mean or variance. This can be particularly problematic if you’re building a hierarchical model. Are you sure you’re using the right parameterization of inverse gamma (shape/scale) and this is what you intend for the prior?

gwawro · July 28, 2022, 5:34pm

Thanks very much for taking a look at the code and catching my silly error on invalid y values for the lognormal.

I have taken out the lower bound as you’ve suggested.

Thanks for pointing out the potential problems with gamma prior. This is what Zhang et. al say they used and we are trying to replicate their analysis. Parameterizing the model in terms of the precision and taking a gamma(1,.001) may avoid the problem. That appears to be what Zhang et. al actually implemented in WinBUGS.

Topic		Replies	Views
Specifying Correlation in Stan Modeling	10	1023	October 9, 2021
How to set up the priors for the random effects? Modeling rstan , techniques , fitting-issues , performance	8	1547	February 11, 2021
Parameterization of covariance matrices Modeling specification	2	633	September 24, 2020
Rstanarm please report bug rstanarm	6	2843	December 4, 2018
Help needed for a constraint parameter problem Modeling	2	581	July 8, 2020

Problems estimating two part model with random effects correlated across parts

Related topics