Initialization error

Hi all, I was trying to run my code and I encountered the following error related to initialization. I’ve pasted the stan code as well as the r code I ran. Any ideas would be greatly appreciated!

Chain 1: Rejecting initial value:
Chain 1: Log probability evaluates to log(0), i.e. negative infinity.
Chain 1: Stan can’t start sampling from this initial value.
Chain 1:
Chain 1: Initialization between (-2, 2) failed after 100 attempts.
Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] “Error in sampler$call_sampler(args_list[[i]]) : Initialization failed.”
[2] “In addition: Warning messages:”
[3] “1: In system2(CXX, args = ARGS) : error in running command”
[4] “2: In file.remove(c(unprocessed, processed)) :”
[5] " cannot remove file ‘/var/folders/5x/rxc8hsw96rvblp_118zbchwh0000gn/T//RtmpIJxN8x/file3d942d4c2ee.stan’, reason ‘No such file or directory’"
error occurred during calling the sampler; sampling not done
sh: clang++ -mmacosx-version-min=10.13: command not found
sh: clang++ -mmacosx-version-min=10.13: command not found
sh: clang++ -mmacosx-version-min=10.13: command not found
sh: clang++ -mmacosx-version-min=10.13: command not found
sh: clang++ -mmacosx-version-min=10.13: command not found

data {

int<lower=0> n_student;
int<lower=0> n_item;
int<lower=0> n_pair;
int<lower=0> res[n_student, n_pair];
int<lower=0> n_dimension;
int<lower=0> p[n_pair, 2];
int<lower=0> d[n_item];

}

parameters { #declares the parameters whose posterior distribution is sought

matrix<lower=0, upper=1> [n_student, n_dimension] theta;
vector<lower=0.25, upper=3> [n_item] a;
vector<lower=-3, upper=3> [n_item] b;
vector<lower=-3, upper=1> [n_item] t;

}

transformed parameters {
cov_matrix[n_dimension] SigmaAB= diag_matrix(rep_vector(1, n_dimension));
}

model { #declare priors for the parameters and define the function

real spr[n_student, n_item];

for (j in 1:n_item){
a[j] ~ lognormal(0,0.25);
b[j] ~ normal(0,4);
t[j] ~ normal(0,4);
}
//theta ~ normal(0,1);
//a ~ beta(1.5,1.5);
//b ~ beta(2,2);
//t ~ beta(2,2);
for (i in 1:n_student){
theta[i][1:n_dimension] ~ multi_normal(rep_vector(0, n_dimension), SigmaAB);
}

for (i in 1:n_student) {
for (j in 1:n_item) {
real thdim[n_student, n_item];
real num0[n_student, n_item];
real num1[n_student, n_item];
real denominator[n_student, n_item];

  thdim[i,j] = theta[i,d[j]]; //d=dimension associated with each statement i
  num0[i,j] = 1 + exp(a[j]*(3*(thdim[i,j]-b[j])));
  num1[i,j] = exp(a[j]*((thdim[i,j]-b[j])-t[j])) + exp(a[j]*((2*(thdim[i,j]-b[j]))-t[j]));
  denominator[i,j] = num0[i,j] + num1[i,j];
  spr[i,j] = num1[i,j]/denominator[i,j];
}

}

for (i in 1:n_student) {
for(j in 1:n_pair){
int s1;
int t1;
real pst[n_student, n_pair];
real pts[n_student, n_pair];
real response[n_student, n_pair];

  s1 = p[j,1];
  t1 = p[j,2];

  pst[i,j] = spr[i,s1]*(1-spr[i,t1]); //10 #p=pair specification map for the given pairs
  pts[i,j] = spr[i,t1]*(1-spr[i,s1]); //01
  
  response[i,j] = pst[i,j]/(pst[i,j]+pts[i,j]); //NxJ matrix of MUPP P(s>t) probabilities for each pair
 
  res[i,j] ~ bernoulli(response[i,j]);
  
}

}
}

init_fun <- function() {
list(a=rep(1,S), b=rep(0,S), t=rep(-1,S), theta=matrix(0,nrow=I,ncol=D))
}
fit <- stan(file = ‘2.stan’, data = data, iter = 1, chains = 1, init=init_fun)

Could you post your code in a preformatted code block? Either surround it with three backticks or there’s a button between upload file and quote.

The thing to do with these errors (they aren’t so informative) is figure out what is causing them.

It’s probably something in the model block, so go through and comment out sections (binary search style) until you find the code that is the problem.

Once you have that then we can work backwards.

If it ends up everything is parameterized okay and the error is still there, then we’ll try lowering the range of random initial values that Stan takes.

Have you tried seeing if it will run without specifying initial values (i.e., just letting Stan generate them for you)?

1 Like

Hi, thanks for your suggestion. I just deleted the initial function and it ran, but I got the following warning, not error, is that problematic? Also, do you know how can I get the output of the stan function (all the estimates)? Thanks!

Warning messages:
1: In system2(CXX, args = ARGS) : error in running command
2: In file.remove(c(unprocessed, processed)) :
cannot remove file ‘/var/folders/5x/rxc8hsw96rvblp_118zbchwh0000gn/T//RtmpIJxN8x/file3d947880aae0.stan’, reason ‘No such file or directory’
3: There were 1 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
4: Examine the pairs() plot to diagnose sampling problems

sorry for the format. I will format it correctly next time! I just tried not specifying initial values myself and let the Stan generate them and my codes run successfully without error but still have some warnings, not sure if they are problematic or I can ignore them.

> Warning messages:
> 1: In system2(CXX, args = ARGS) : error in running command
> 2: In file.remove(c(unprocessed, processed)) :
>   cannot remove file '/var/folders/5x/rxc8hsw96rvblp_118zbchwh0000gn/T//RtmpIJxN8x/file3d947880aae0.stan', reason 'No such file or directory'
> 3: There were 1 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
> http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
> 4: Examine the pairs() plot to diagnose sampling problems

Glad that runs!

I’m not entirely sure what’s causing them, but I think they can be ignored if your model is running ok.

@bgoodri Is there a way to get rid of these warnings? They definitely make it seem like there are problems when there aren’t (as far as I know).

If it ran successfully, does it mean that I can’t specify the initial values myself? Will it affect my parameter estimates?

In general you can specify initial values, but in this case when you specified them it resulted in a sampler that couldn’t move anywhere. The key is in the first part of the message:

Under the hood the model block is just accumulating log probability (log prior + log likelihood), and in this case the combination of initial values chosen, the parameter constraints declared, and the code specifying the model in the model block resulted in a log probability of \log{0} = -\infty and so the sampler can’t move anywhere from there.

Here are some tips on your Stan program (some related to this, others not so much):

Are these constraints actually known or just reasonable bounds? If the latter then it’s better not to code that using constraints and instead express that information in the prior. Normal distributions, for example, give you soft constraints because of the exponential decay in the tails. So unless these constraints are logically necessary (e.g., lower=0 for a standard deviation) or known physical constants or something like that, I wouldn’t use them.

Also, you’re giving theta a multivariate normal prior below but constraining it to the (0,1) interval. Is that intentional?

This isn’t actually a transformation of a parameter, you’re just declaring an identity matrix right? If so, it’s more efficient to put this in transformed data (calculated once) instead of transformed parameters (calculated a lot every leapfrog step within every iteration of HMC)

Maybe I’m missing something but if SigmaAB is the identity matrix and the mean is the 0-vector then aren’t you just giving all the elements of theta iid standard normal priors right? If so then you can get rid of SigmaAB entirely and replace that entire loop with:

to_vector(theta) ~ normal(0, 1);

This will be faster if you vectorize:

a ~ lognormal(0,0.25);
b ~ normal(0,4);
t ~ normal(0,4);

Each time through the loop you’re only calculating one entry of these objects but you’re redeclaring the entire arrays every time. I would replace it with

for (i in 1:n_student) {
  for (j in 1:n_item) {
    real thdim;
    real num0;
    real num1;
    real denominator;

    thdim = theta[i,d[j]]; //d=dimension associated with each statement i
    num0 = 1 + exp(a[j]*(3*(thdim-b[j])));
    num1 = exp(a[j]*((thdim-b[j])-t[j])) + exp(a[j]*((2*(thdim-b[j]))-t[j]));
    denominator = num0 + num1;
    // you only need to save this one, the others can be overwritten each time
    spr[i,j] = num1/denominator;
}
}

since you can just overwrite them each time through the loop and only fill in spr.

Same thing here. You don’t need a new array pst and pts every time through the loop, you’re just using a single element (the i,j element). So this should be changed to something
like:

for (i in 1:n_student) {
 for(j in 1:n_pair){
  int s1;
  int t1;
  real pst; 
  real pts;
  real response;

  s1 = p[j,1];
  t1 = p[j,2];

  pst = spr[i,s1]*(1-spr[i,t1]); //10 #p=pair specification map for the given pairs
  pts = spr[i,t1]*(1-spr[i,s1]); //01
  
  response = pst/(pst+pts); //NxJ matrix of MUPP P(s>t) probabilities for each pair
 
  res[i,j] ~ bernoulli(response);
  
  }
}

But you can go a step further in this case and vectorize the call the bernoulli (vectorizing calls to probability distributions is much faster) by moving response outside of the loop and storing its elements and moving bernoulli below the loop. At the top of the model block put

model {
  real response[n_student, n_pair];
  ...

and then the loop I just rewrote above changes to

for (i in 1:n_student) {
 for(j in 1:n_pair){
  int s1;
  int t1;
  real pst; 
  real pts;

  s1 = p[j,1];
  t1 = p[j,2];

  pst = spr[i,s1]*(1-spr[i,t1]); //10 #p=pair specification map for the given pairs
  pts = spr[i,t1]*(1-spr[i,s1]); //01

// store response  
  response[i,j] = pst/(pst+pts); //NxJ matrix of MUPP P(s>t) probabilities for each pair  
  }
}
// move call to bernoulli outside the loop 
// need to flatten 2d arrays to 1d arrays so bernoulli doesn't complain
to_array_1d(res) ~ bernoulli(to_array_1d(response));

Hope that helps!

Thank you so much for the detailed explanation and suggestions!
This is really weird…at first I followed your suggestion to delete the initial value specification, the code run, and then I followed your latest suggestion and changed my code to make the algorithm faster, I got the following error. And then I still kept the other changes based on your suggestion except for the last step (I moved the response inside the loop like what I had originally), the code run. And then I added my initial value specification, it still runs…so I don’t know what’s wrong with vectorizing calls to the bernoulli, but at least it can run even with initial value specification, so thank you!

Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: bernoulli_lpmf: Probability parameter[2] is nan, but must be finite! (in ‘model6b7a7e4fc19_mupp2’ at line 118)

Glad you can run it with initial values now!

If possible could you share the full Stan program you ran when you got that error? I definitely think the bernoulli call can be vectorized in this model but I may have made a typo or maybe something got messed up when combining the different pieces of my suggestions together. I can probably figure it out if I see the full Stan program.

Thanks for sharing. I just edited your post to format the code with Stan syntax highlighting. For future reference, no need to start every line with > like you had it, you can indicate the start a Stan code block with three back-ticks and the word stan

```stan
data {
  int<lower=0> n_student;
  ...

and end it with three back-ticks (without the word Stan). If I do that it renders like this

data {
  int<lower=0> n_student;

I think this is the reason you got the error:

The last line inside the loop should be filling in response[i,j]. The bernoulli line needs to be moved outside the loops. Try moving it to after the closing braces:

   response[i,j] = pst/(pst+pts); 
   }
 }
 to_array_1d(res) ~ bernoulli(to_array_1d(response));

Does that work better?

It ran at first with no error only warning, but the second time I tried to run with no changes, I got the following error:

Error in Module(module, mustStart = TRUE) : 
  Failed to initialize module pointer: Error in FUN(X[[i]], ...): no such symbol _rcpp_module_boot_stan_fit4model77071f10f481_mupp2_mod in package /var/folders/5x/rxc8hsw96rvblp_118zbchwh0000gn/T//Rtmp0Fcc0N/file77076a2736e.so

Hmm, that’s a weird error from Rcpp. I could be wrong but just starting a fresh R session and refitting the model may fix that.

Thanks it works! Two more questions about Rstan: (1) is it possible to specify four-parameter beta prior distribution in stan? (2) is it possible to input initial value file instead of specifying them in the R code? Thanks!

2 Likes

I don’t think we have a 4-parameter beta distribution built-in to Stan but you can use any valid probability distribution by just coding the log-probability in the Stan program. Here’s a section in the doc with a few examples of coding probability distributions directly in Stan: 19.1 Examples | Stan User’s Guide. If you need help figuring out how to do it for the distribution you want I recommend starting a new topic here on the forums.

With RStan I don’t think so, but you can definitely do this with our new CmdStanR interface, which is an alternative way of interfacing with Stan from R.

1 Like

Thanks! Will do if I cannot figure it out!
I just finished running my code and it took more than 10 hours, which is hours slower than Metropolis hasting, is it common that HMC is slower than MH? I thought it is faster.
Another question about extracting the output, I have a output matrix [3000 observations, # of parameter estimated], do you know why there is 3000 observations? I set 2000 iterations with 3 chains. Thanks!

I’ll give you some quick answers here but if you want to go more in depth I suggest creating new topics here on the forum since these are different issues than the initialization error in this topic (we encourage separate posts for different topics so that it’s easier to find things on the forum).

I’d be surprised if Metropolis hastings was more efficient if the model is coded in the most efficient way (not sure if yours is, but that’s a good question for a different post on the forum). You can’t just compare the amount of time to run the same number of iterations, because the only way to really measure MCMC efficiency is effective sample size per unit of time (just to emphasize, that’s effective posterior sample size not total posterior sample size). It’s common for HMC to require orders of magnitude fewer iterations to obtain the same effective sample size.

By default the first half of each chain is discarded as warmup, so you have 1000 post-warmup iterations for each of 3 chains = 3000 posterior draws.

Thank you so much!