Initialization failed with truncated normal distribution

lisa1 · January 29, 2021, 10:23am

Dear all,

I would like to fit a model with a truncated normal distribution to my data. The data is from 50 subjects who on each trial do two ratings which go from 0 to 100 (including those values). I would like to truncate the normal distribution for these ratings with a lower boundary of 0 and an upper boundary of 100. However, when I include the truncation, some chains finish, but others give the error: “Initialization between (-2, 2) failed after 100 attempts.”

data {
  int ntr;                  // number of trials
  int ntr_phase;            // maximum trial number per phase
  int nrow;                 // number of rows of whole data set (equal to ntr * nsub)
  int nsub;                 // total number of subjects 
  int subID[nrow];          // subject ID
  real U;                   // = 100, upper bound for ratings
  real L;                   // = 0, lower bound for ratings
  // ratings
  real<lower=L,upper=U> RateSelf[nrow];     // go from 0 to 100
  real<lower=L,upper=U> RateOther[nrow];    // go from 0 to 100
  int active_trial[nrow];   // trial type, 0 or 1
  real control_level[nrow]; // goes from 0.2 to 0.8
  real feedback[nrow];      // goes from 0 to 100
  int phase_trial[nrow]; 
}

parameters {
  real<lower=0> rating_noise[nsub]; 
  real<lower=0,upper=1>learning_rate[nsub];
}

model {
  real belief_self[nrow]; 
  real belief_other[nrow];
  real belief_total[nrow];
  real PE_total[nrow];
  real PE_self[nrow];
  real PE_other[nrow];
  
  // Priors for parameters
  rating_noise ~ normal(20,5); 
  learning_rate ~ normal(0.5,0.1);
  
  // Learning model: how we update beliefs based on outcomes
  for (itr in 1:nrow){
    if (phase_trial[itr]==1){
      belief_self[itr]  = RateSelf[itr];
      belief_other[itr] = RateOther[itr];
    } 
    // Based on feedback, update the belief for the next trial
    if (active_trial[itr]!=1){
      belief_total[itr] = control_level[itr] * belief_self[itr] + (1-control_level[itr])*belief_other[itr];
    } else{
      belief_total[itr] = control_level[itr] * 0 + (1-control_level[itr])*belief_other[itr];
    }
    PE_total[itr]     = feedback[itr] - belief_total[itr];
    
    // split up total PE
    PE_self[itr]      = PE_total[itr] * control_level[itr];
    PE_other[itr]     = PE_total[itr] - PE_self[itr];
    
    // Update beliefs for next trial
    if (phase_trial[itr] < ntr_phase){
      belief_self[itr+1]  = belief_self[itr]  + learning_rate[subID[itr]]*PE_self[itr];
      belief_other[itr+1] = belief_other[itr] + learning_rate[subID[itr]]*PE_other[itr];
    }
  }
  
  // Decision model: mapping beliefs to ratings
  for (itr in 1:nrow){
    RateSelf[itr] ~ normal(belief_self[itr],rating_noise[subID[itr]]) T[L,U];
    RateOther[itr] ~ normal(belief_other[itr],rating_noise[subID[itr]]) T[L,U];
  }
}

I tried the following things:

Works for only one subject at a time (for e.g. subjects 1-5)
Does not work for subjects 1-5 together
works with only upper boundary and works with only lower boundary

What has not helped:

force data to be >0 and <100
avoid fitting the first trial where rating = belief
in the model, force beliefs (i.e. mean of truncated distribution) to remain within boundaries

I called stan with this command:

fit_data = rstan::stan(file='model_truncated.stan',data=data_real,chains=3,iter=1000, cores=1)

Could anyone point out what went wrong in the model?

Help is much appreciated!
Best, Lisa

martinmodrak · February 5, 2021, 4:30pm

Sorry for not getting to your question earlier - it is relevant and well written.

The most important thing to do is probably to figure out where exactly has the initialization failed - to do this you can run a single chain, which should give you a lot more additional output.

I cannot run the model myself, as I don’t have the data, but most often these types of problems arise because your parameters allow model configurations that either a) produce invalid values of some function parameters (e.g. infinite standard deviation) or b) have 0 probablity (i.e. -inf log probability). It is then possible that initialization hits this region and the sampling cannot proceed.

You can also add print statements to your model to show some intermediate quantities to better pinpoint the culprit.

One possible problem is that belief_self[itr] initializes to values far from L and U which could cause numeric problems when evaluating normal_lcdf / normal_lccdf for the truncation bounds (see 7.4 Sampling statements | Stan Reference Manual if you want to know how these two functions get involved). Other than that, I don’t see anything immediately problematic about the model, so can’t help more without access to the data and/or detailed output from the single chain.

As a minor suggestion: if the ratings are limited to integers, a beta-binomial might also be a sensible model that could avoid some of the numerical problems with truncated distributions (but you know your data better, so a final call is on you).

lisa1 · February 8, 2021, 2:37pm

Hi Martin,

Thanks a lot for getting in touch about this and all the helpful suggestions!
I ran the model with one chain with this command:

fit_data = rstan::stan(file='model_truncated.stan',data=data_real,chains=1,iter=1, cores=1)

This then gave me the following message:

SAMPLING FOR MODEL 'RL_model_v1_normal_truncated' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
error occurred during calling the sampler; sampling not done

So would this point towards the first possibility you suggested, that the model configurations produce invalid values for some parameters?

I ran the model again and printed the belief_self[itr] and belief_other[itr] for the first trials and indeed, these two variables sometimes initialize far from the truncation boundaries (e.g. around 50 for itr = 45). Sometimes they also assume the values of the boundaries (e.g. belief_self = 100 for itr = 34):

[...]
Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: itr: 1
belief_self: 63.84
belief_other: 45.17
itr: 12
belief_self: 86.95
belief_other: 67.62
itr: 23
belief_self: 25.07
belief_other: 70.89
itr: 34
belief_self: 100
belief_other: 74.1
itr: 45
belief_self: 50.04
belief_other: 50.22
itr: 56
belief_self: 98.9
belief_other: 50.77
itr: 67
belief_self: 60.07
belief_other: 31.62
itr: 78
belief_self: 50.49
belief_other: 29.07
itr: 89
belief_self: 19.2
belief_other: 60.07
itr: 100
belief_self: 60.98
belief_other: 40.59
itr: 111
belief_self: 45.69
belief_other: 37.84
itr: 122
belief_self: 71.96
belief_other: 58.24
itr: 133
belief_self: 50.78
belief_other: 49.22
itr: 144
belief_self: 25.59
belief_other: 50
itr: 155
belief_self: 58.75
belief_other: 49.74

Chain 1: itr: 1
belief_self: 63.84
belief_other: 45.17
itr: 12
belief_self: 86.95
belief_other: 67.62
itr: 23
belief_self: 25.07
belief_other: 70.89
itr: 34
belief_self: 100
belief_other: 74.1
itr: 45
belief_self: 50.04
belief_other: 50.22
itr: 56
belief_self: 98.9
belief_other: 50.77
itr: 67
belief_self: 60.07
belief_other: 31.62
itr: 78
belief_self: 50.49
belief_other: 29.07
itr: 89
belief_self: 19.2
belief_other: 60.07
itr: 100
belief_self: 60.98
belief_other: 40.59
itr: 111
belief_self: 45.69
belief_other: 37.84
itr: 122
belief_self: 71.96
belief_other: 58.24
itr: 133
belief_self: 50.78
belief_other: 49.22
itr: 144
belief_self: 25.59
belief_other: 50
itr: 155
belief_self: 58.75
belief_other: 49.74

Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
error occurred during calling the sampler; sampling not done

But then I still don’t really understand why this doesn’t seem to cause a problem when I fit subjects separately, but only when I fit them together?

And thanks also for your suggestion of the beta binomial! I fitted the beta distribution before, but haven’t thought of trying the discrete version yet. I need to think about this a bit more. It could make sense to use this one, but still I would be really interested to understand the problem with the truncated normal distribution.

martinmodrak · February 11, 2021, 2:58pm

Yes, thats almost definitely the case. It is also obvious I don’t understand the model, because it seems like belief_self is constructed to never exceed 100 (or is it?) which I really didn’t get :-) This shouldn’t be immediately a problem…

Another thing that could cause problems is that rating_noise contains very small values… Sorry for not being specific, it is really hard to tell just from the model what is happening.

One thing that should help woulde be to print the actual contributions to the log-likelihood, i.e. (just for RateSelf not debugged)

print("Self LPDF[", itr, "]: ", normal_lpdf(RateSelf[itr] | belief_self[itr],rating_noise[subID[itr]]);
print("Self LCDF low[", itr, "]: ", normal_lcdf(L | belief_self[itr],rating_noise[subID[itr]]);
print("Self LCDF high[", itr, "]: ", normal_lcdf(U | belief_self[itr],rating_noise[subID[itr]]);

Presumably, either one of those is infinite or the LCDF at low and high bounds becomes almost equal (causing “division by zero”). Most often, the elements would be infinite because the the value you evaluate at is more than roughly 22 times the sd from the mean, e.g. something like normal_lcdf(-100 | 0, 1) is likely to return -Inf.

Now obviously, one could just provide some sensible initial values and likely avoid the investigation altogether, but I usually find investigating such issues worthwhile as they can signal problems that could show up later in modelling.

Best of luck with your model!

lisa1 · March 15, 2021, 3:19pm

Hi Martin,

Sorry for my excessively slow reply! Thanks a lot for your reply and the helpful explanations.
Regarding your suggestion to print the contributions of the log likelihood, I indeed find that the LCDF for the low and high bounds often assume 0 and -inf:

[...]
Self LPDF[220]: -12.0912
Self LCDF low[220]: -inf
Self LCDF high[220]: 0

Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
error occurred during calling the sampler; sampling not done

Sorry, I’m relatively new to Stan so I’m not really sure what to do about this: Is there an obvious solution for how to fix this? I would like to get a better understanding of how to debug this and why this error is happening…

In case this is helpful, I also uploaded the data here: data_for_stan.RData - Google Drive
This was the stan model I tried to fit this data on:

data {
  int ntr;                  // number of trials
  int ntr_phase;            // maximum trial number per phase
  int nrow;                 // number of rows of whole data set (equal to ntr * nsub)
  int nsub;                 // total number of subjects 
  int subID[nrow];          // subject ID
  real U;                   // = 100, upper bound for ratings
  real L;                   // = 0, lower bound for ratings
  // ratings
  real<lower=L,upper=U> rate_self[nrow];     // go from 0 to 100
  real<lower=L,upper=U> rate_other[nrow];    // go from 0 to 100
  int active_trial[nrow];   // trial type, 0 or 1
  real control_level[nrow]; // goes from 0.2 to 0.8
  real feedback[nrow];      // goes from 0 to 100
  int phase_trial[nrow]; 
}

parameters {
  real<lower=0> rating_noise[nsub]; 
  real<lower=0,upper=1>learning_rate[nsub];
}

model {
  real belief_self[nrow]; 
  real belief_other[nrow];
  real belief_total[nrow];
  real PE_total[nrow];
  real PE_self[nrow];
  real PE_other[nrow];
  
  // Priors for parameters
  rating_noise ~ normal(20,5); 
  learning_rate ~ normal(0.5,0.1);
  
  // Learning model: how we update beliefs based on outcomes
  for (itr in 1:nrow){
    if (phase_trial[itr]==1){
      belief_self[itr]  = rate_self[itr];
      belief_other[itr] = rate_other[itr];
    } 
    // Based on feedback, update the belief for the next trial
    if (active_trial[itr]!=1){
      belief_total[itr] = control_level[itr] * belief_self[itr] + (1-control_level[itr])*belief_other[itr];
    } else{
      belief_total[itr] = control_level[itr] * 0 + (1-control_level[itr])*belief_other[itr];
    }
    PE_total[itr]     = feedback[itr] - belief_total[itr];
    
    // split up total PE
    PE_self[itr]      = PE_total[itr] * control_level[itr];
    PE_other[itr]     = PE_total[itr] - PE_self[itr];
    
    // Update beliefs for next trial
    if (phase_trial[itr] < ntr_phase){
      belief_self[itr+1]  = belief_self[itr]  + learning_rate[subID[itr]]*PE_self[itr];
      belief_other[itr+1] = belief_other[itr] + learning_rate[subID[itr]]*PE_other[itr];
    }
  }
  
  // Decision model: mapping beliefs to ratings
  for (itr in 1:nrow){
    rate_self[itr] ~ normal(belief_self[itr],rating_noise[subID[itr]]) T[L,U];
    rate_other[itr] ~ normal(belief_other[itr],rating_noise[subID[itr]]) T[L,U];
  }
}

Best, Lisa

martinmodrak · March 16, 2021, 1:13pm

No worries about the timing. I also sometimes work on weird schedules. I am happy to have helped.

So I think, I got to the bottom of this - since stan initializes (by default) from -2 to 2 on the unconstrained scale, after transforming to positive values via exp, rating_noise is initialized between exp(-2) ~= 0.135 and exp(2) ~= 7.389. Now, in Stan 2.21 (which is the version latest rstan version on CRAN has), there was a numerical instability, so normal_lcdf(0 | 50, sigma) is evaluated to -inf for sigma < 0.5 or so. This is a bug, as e.g. normal_lcdf(0 | 50, 0.5) ~= -5005 which is far from -inf.

This behaviour got fixed in the meantime, but getting new rstan on CRAN has been problematic (for mostly stupid reasons).

Code to reproduce via cmdstanr

Fails to initialize with Stan 2.21, initializes OK with Stan 2.25

m <- cmdstan_model(write_stan_file("
parameters {
  real mu;
  real<lower=0> sigma;
}

model {
  print(mu,\",\", sigma,\",\", normal_lcdf(100 | mu, sigma),\",\", normal_lcdf(0 | mu, sigma), \", \", 
  log_diff_exp(normal_lcdf(100 | mu, sigma), normal_lcdf(0 | mu, sigma)));
  target += normal_lpdf(40| mu, sigma) + log_diff_exp(normal_lcdf(100 | mu, sigma), normal_lcdf(0 | mu, sigma));
}
"))

res <- m$sample(iter_warmup = 1, iter_sampling = 1, chains = 1, init = function() {
 list(mu = 50, sigma = 0.5)
})

This also why running a smaller model could have been OK as you had fewer elements in rating_noise and thus the probability that all of the elements initialize sufficiently high to avoid the pathology was reasonable and 100 attempts were enough to hit it at least once. With more elements, the probablity that all are OK drops very quickly and 100 attempts are no longer enough.

Using Stan 2.25 I can succesfully initialize and run the model. So what are your options?

Use latest Stan, either by installing new rstan version from the Stan R package repo (install.packages("rstan", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))) or by using cmdstanr
Use a multiplier, i.e. have the actual parameter on unit scale and transform it to larger values, e.g.:

parameters {
  real<lower=0> rating_noise_raw[nsub]; 
}

transformed parameters {
  real<lower=0> rating_noise[nsub] = rating_noise_raw * 20; 
}

model {
  rating_noise_raw ~ normal(1, 1);
}

Initialize the rating_noise parameter to within your prior expectations.

Best of luck with the model!

lisa1 · March 16, 2021, 6:42pm

Amazing, this worked! I installed the new rstan version from the Stan R package repo, like you suggested, and now I can run the model without the initialisation failure.
Thanks so much for tracking this down and your again very clear explanations!

Topic		Replies	Views
Initialization failure in rstan Modeling	1	534	May 15, 2020
Dear all, I would like to fit a model with a weibulll distribution to my data. However, when I include the truncation, chain=1 give the error: “Initialization between (-2, 2) failed after 100 attempts.” RStan rstan	2	513	June 9, 2021
Failed initialization--New to STAN, not sure how to debug Modeling fitting-issues , specification	3	2068	November 25, 2017
Initialization between (-2, 2) failed after 100 attempts Modeling fitting-issues	3	599	January 30, 2021
Initialization failed error - Attempt to fit model using rstan Modeling rstan , techniques , fitting-issues	1	444	May 30, 2020

Initialization failed with truncated normal distribution

Related topics