Initialization fails after reparameterization

Background to give you an idea of what I’m trying to do and why:
I’m attempting to implement a reparameterization of a working stan code to improve mixing. This was prompted by warning messages regarding Bayesian Fraction of Missing Information being low, as well as low effective sample size for some parameters. Following suggestion to plot pairs, it is clear that one parameter denoted taoSm is strongly correlated with energy__.

Here is a traceplot of said parameter:

I have definitely encountered worse mixing, but it could be improved; running 100000 iterations took ~15h and gave an effective sample size ~1200 across four chains. Again, it could be worse, but I have around 400 datasets to analyze, and, while not all analyses are as ill-behaved, reducing computation time would be valuable.
The parameter taoSm is the standard deviation of random effects, and, for weak data, there is substantial uncertainty for individual effects, which I am guessing causing the problem. The sampler has to explore very narrow distributions as well as very wide, and, without understanding all the details of the HMC, I think this causes similar problems as in Metropolis-Hastings Gaussian RW; it’s impossible to find a step size that facilitates good mixing for both narrow and wide parts of the posterior. Here is a traceplot of one of the random effects, showing that the chains go through phases of mixing close to zero as well as phases mixing from a wider distribution:

One obvious option would be to simplify the model and remove these random effects, but, for predictive purposes, I prefer to have these random effects in the model, even though they admittedly will be sensitive the prior put on taoSm.

So, I’m thinking a reparameterization of the model might be worth exploring. Rather than the original implementation, where random effect ik is modelled as logKm[ik] ~ normal( 0 , taoSm ), I’m attempting to include a parameter transformation such that logKm_raw[ik] ~ normal( 0 , 1 ) and logKm becomes a transformed parameter defined as logKm[ik]=logKm_raw[ik] * taoSm. That is, I’m assuming that the random effects are standard normally distributed, and taoSm becomes a scaling parameter. Maybe I’m thinking about this wrong, but I think it should improve mixing.

Error issues

However, I keep getting error messages about rejecting initial values. Whether I supply a list of initial values or let stan seed by defaults, I get these messages:

Chain 1: Rejecting initial value:

Chain 1: Error evaluating the log probability at the initial value.

Chain 1: Exception: validate transformed params: m[i_0__] is nan, but must be greater than or equal to 0 (in ‘model1df88f33f0c_Hunt_stan_code_Loptional_alphatrend_std_trans2’ at line 50)

Chain 1:

Chain 1: Initialization between (-2, 2) failed after 100 attempts.

Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.

[1] “Error in sampler$call_sampler(args_list[[i]]) : Initialization failed.”

[1] “error occurred during calling the sampler; sampling not done”

[1] “done with stan”

[1] “ran stan”

From what I can tell from the first error message, the problem occurs in the calculation of another transformed parameter, denoted m[ik] and calculated in part from the now transformed parameter logKm[ik]. For some reason that I can’t figure out, m is evaluated as nan, which suggests that logKm[ik] (which is the only parameter that has changed in my reparametrized version of the model) has been evaluated as nan. This puzzles me because logKm[ik] is merely the product of a the real variable logKm_raw[NrOfKretsar] and real<lower=0> variable taoSm.

Any ideas about what’s going on? I’m assuming I’m making a trivial error somewhere, but I’ve been staring at it for two days now without resolving it. Any pointers would be appreciated. In the below code, I have notated which parts have been added/changed in the reparameterized version and hence would cause the issue.

data {
  int<lower=1> NrOfLans;
  int<lower=1> NrOfKretsar;
  int<lower=1> Reports;
  int<lower=0> K[Reports];
  real<lower=0> A[Reports];
  real logrelA[Reports];
  int<lower=1> KretsNrSeqID[Reports];
  int<lower=1> LansSeqID[NrOfLans];
  int<lower=1> LanSeqID[Reports];
  int<lower=1> KretsarSeqID[NrOfKretsar];
  int<lower=1> KretsListLanSeqID[NrOfKretsar];
  int<lower=0, upper=1> includealpha;//Boolean declaration
  int<lower=0, upper=1> includeLeffectonalpha;//
  int<lower=0, upper=1> includealpha1;//
  int<lower=0, upper=1> includephi;//
  int<lower=0, upper=1> includeLeffectonphi;//
  
  
}


parameters {
  
  real logLm[NrOfLans];
  real logKm_raw[NrOfKretsar];//changed for reparametrisation
  real logLalpha[includealpha && includeLeffectonalpha ? NrOfLans : 0];
  
  real logLphi[includephi && includeLeffectonphi ? NrOfLans : 0];
  real logSm;
  real logSalpha[includealpha ? 1 : 0];
  
  real logSphi[includephi ? 1 : 0];
  
  real<lower=0> Tm;
  real<lower=0> Talpha[includealpha && includeLeffectonalpha ? 1 : 0];
  
  real<lower=0> Tphi[includephi && includeLeffectonphi ? 1 : 0];
  
  real<lower=0> taoSm;
  
  
  real alpha1[includealpha1 ? 1 : 0];
}

transformed parameters {
  
  
  real<lower=0> alpha[includealpha ? NrOfKretsar : 0];
  real phi[includephi ? NrOfKretsar : 0];
  real<lower=0> m[NrOfKretsar];
  real logKm[NrOfKretsar];  //changed for reparametrisation
  
  
  
  
  
  
  for (ik in 1:NrOfKretsar){
    
    m[ik] = exp( logSm + logLm[KretsListLanSeqID[ik]] + logKm[ik] );
    
    
    if (includealpha ){
      if (includealpha1) {
        
        if (includeLeffectonalpha ) {
          
          alpha[ik] = exp(logSalpha[1] + logLalpha[KretsListLanSeqID[ik]] + alpha1[1]*(logSm + logLm[KretsListLanSeqID[ik]] + logKm[ik]));
        } else {
          
          alpha[ik] = exp(logSalpha[1] + alpha1[1]*(logSm + logLm[KretsListLanSeqID[ik]] + logKm[ik]));
          
        }
        
        
      }else{
        
        if (includeLeffectonalpha ) {
          
          alpha[ik] = exp(logSalpha[1] + logLalpha[KretsListLanSeqID[ik]]);
        } else {
          
          alpha[ik] = exp(logSalpha[1]);
          
        }
        
        
        
      }
    }
    if (includephi){
      
      if (includeLeffectonphi){
        
        phi[ik] = logSphi[1] + logLphi[KretsListLanSeqID[ik]];
        
      } else {
        
        phi[ik] = logSphi[1];
        
      }
      
      
      
    } 
    
  }
  for (ik in 1:NrOfKretsar){//added for reparametrisation
    logKm[ik]=logKm_raw[ik] * taoSm;
  }
  
  
}
model{
  
  for (r in 1:Reports){
    
    // Number of killed animals in report r is negative binomial distributed,
    // but defined from the mean (m[KretsNrSeqID[r]] * A[r]) and shape (c[Lan[r]]) of the gamma distribution of the equivalent gamma-poisson mixture
    if (includealpha ){
      if (includephi){
        
        K[r] ~ neg_binomial_2(m[KretsNrSeqID[r]] * A[r] * exp(logrelA[r] * phi[KretsNrSeqID[r]]) ,alpha[KretsNrSeqID[r]]); 
        
      }else{
        K[r] ~ neg_binomial_2(m[KretsNrSeqID[r]] * A[r] ,alpha[KretsNrSeqID[r]]);
      }
      
    } else {
      if (includephi){
        K[r] ~ poisson(m[KretsNrSeqID[r]] * A[r] * exp(logrelA[r] * phi[KretsNrSeqID[r]])); 
      }else{
        K[r] ~ poisson(m[KretsNrSeqID[r]] * A[r]); 
        
      }
      
    }
    
    
  }
  
  for(ik in 1:NrOfKretsar){
    
    
    
    logKm_raw[ik] ~ normal( 0 , 1 ); //changed for reparametrisation
    
  }
  for (il in 1:NrOfLans){
    
    
    logLm[il] ~ normal( 0 , Tm );// Tm is the std of between county variability for logm
    
    if (includealpha && includeLeffectonalpha){
      
      logLalpha[il] ~ normal( 0 , Talpha[1] );// Talpha is the std of between county variability for logalpha
      
    }
    if (includephi && includeLeffectonphi){
      
      
      
      logLphi[il] ~ normal( 0 , Tphi[1]);// Tphi is the std of between county variability for logphi
    }
    
    
  }
  
  logSm ~ normal(0,100 );
  if (includealpha){
    logSalpha[1] ~ normal(0,2.3496 );//based on 95% coef var between 0.1 and 10
    if (includeLeffectonalpha) {
      
      Talpha[1] ~ cauchy(0,1);
    }
    
    
    
  }
  
  if (includealpha1){
    
    alpha1[1] ~ normal(0,100);
    
  }
  
  if (includephi){
    logSphi[1] ~ normal(0,0.2984503);// based on 95% between -0.585 and 0.585, which corresponds to maximum 300% increas in hunting rate with twice the area
    if (includeLeffectonphi){
      
      Tphi[1] ~ gamma(1.9178,0.2171); // based on 95% between 1 and 25
      
      
    }
    
    
    
  }
  
  Tm ~ lognormal(-0.7274912,0.4341882);
  
  taoSm ~ lognormal(-1.630514,0.7066259);
  
}

generated quantities {
  vector[Reports] log_lik;
  for (r in 1:Reports){
    
    
    
    
    if (includealpha){
      if (includephi){
        log_lik[r] = neg_binomial_2_lpmf(K[r] | m[KretsNrSeqID[r]] * A[r] * exp(logrelA[r] * phi[KretsNrSeqID[r]]), alpha[KretsNrSeqID[r]]);
      }else{
        log_lik[r] = neg_binomial_2_lpmf(K[r] | m[KretsNrSeqID[r]] * A[r], alpha[KretsNrSeqID[r]]);
        
      }
    } else {
      if (includephi){
        
        log_lik[r] = poisson_lpmf(K[r] | m[KretsNrSeqID[r]] * A[r] * exp(logrelA[r] * phi[KretsNrSeqID[r]]));
      }else{
        
        log_lik[r] = poisson_lpmf(K[r] | m[KretsNrSeqID[r]] * A[r]);
      }
      
    }
    
    
  }
  
  
}


Unlike in BUGS or JAGS the order in which statements are written matters. You need define logKm before using it. Swap the loops in transformed parameters.

By the way, these priors

  logSm ~ normal(0, 100);
  Talpha[1] ~ cauchy(0, 1);
  alpha1[1] ~ normal(0, 100);

look awfully wide for something on log-scale. Even a simple normal(0,10) is non-informative.

1 Like

Thanks nhuurre! A trivial error indeed. I thought I had tried every possible order of defining parameters, but apparently not that. Now that it works, the transformation approach seem to be very successful in terms of improved efficiency; the number of iterations required to get an effective sample size of >1000 for all parameters was reduced by ~ an order of magnitude.

And thanks for the input on the priors. I so far haven’t been paying much attention to these ones because the posteriors are insensitive to these choices. But you’re right–they need some more further consideration.