Initialization between (-2, 2) failed after 100 attempts

Hi I am new to Stan. I am trying to estimate a model and to start, I run ‘model0$optimize(data = data_m0,algorithm = ‘lbfgs’)’. However, it returns the following error message:

Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: lkj_corr_lpdf: Correlation matrix is not positive definite. (in ‘/tmp/RtmpU01otz/model-2e799e29da4e1b.stan’, line 43, column 2 to column 22)
Rejecting initial value:

Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model. "

How should I fix the intial values? Would really appreciate any help!! Thanks

model{
  matrix[n_par,n_par] Sigma;
  //vector[N] delta; //individual_habit derperacion rate
  
  Omega ~ lkj_corr(2);
  tau ~ cauchy(0,2.5);
  
  Sigma = quad_form_diag(Omega,tau);
  //equivalent to diag(tau)*Omega*diag(tau)
  
  mu ~ normal(0,5);
  
  for (i in 1:N){
    beta[,i] ~ multi_normal(mu,Sigma);
  }
  
  for (ob in 1:nobs){
    real habit_stock;
    vector[2] utils;
    int id = ids[ob];
    vector[n_par] b = beta[1:n_par,id];
    
    if (t_0[ob]==1){
      habit_stock = b[1];
    } 
    
    utils[1] = b[n_par_h+1]*habit_stock + b[n_par_h+2]*habit_stock^2 + b[n_par_h+3] + dot_product(b[(n_par_h+5):(n_par_h+4+k_use)],to_vector(x_use[ob])) + dot_product(b[(n_par_h+k_use+5):(n_par_h+k_use+4+k_context)],to_vector(x_context[ob])) + dot_product(b[(n_par_h+k_use+5+k_context):n_par],to_vector(x_indi[id,]));
    y_out[ob,1] ~ bernoulli_logit(utils[1]);
    

    utils[2] = b[(n_par_h+4)];
    y_out[ob,2] ~ bernoulli_logit(utils[2]+utils[1]);
    
    
    habit_stock = habit_stock*b[1]+dot_product(b[2:n_par_h],to_vector(x_use[ob]));
  }
  
  }

To address this question, we need to see the entire Stan file, including the data and especially the parameters blocks, as well as transformed data and transformed parameters if those exist.

hi here is the full code.

Regading the data, should I share the whole data/a small sample or a snapshot would be enough?

data{
  int<lower=1> nobs;
  int<lower=1> N;
  int<lower=1> n_out;
  array[nobs] int<lower=1> ids;
  
  int<lower=1> k_use;
  array[nobs,n_out] int<lower=0,upper=1> y_out;
  
  array[nobs,k_use] int<lower=0,upper=1> x_use;
  array[nobs] int<lower=1> t_0;
}

transformed data{
  int<lower=1> n_par_h = 1+n_out; //how does different activity contribute to habit stock = 3
  int<lower=1> n_par_r = n_out + 2 + k_use; //9;
  int<lower=1> n_par = n_par_h + n_par_r;
}

parameters{
  matrix[n_par,N] beta;
  
  corr_matrix[n_par] Omega;
  
  vector[n_par] mu;
  vector<lower=0>[n_par] tau;
}

model{
  matrix[n_par,n_par]Sigma;
  Omega ~ lkj_corr(2);
  tau ~ cauchy(0,2.5);
  
  Sigma = quad_form_diag(Omega,tau);
  
  mu ~ normal(0,5);
  
  for (i in 1:N){
    beta[,i] ~ multi_normal(mu,Sigma);
  }
  
  real habit_stock = beta[1,1]*dot_product(beta[2:(k_use+1),1],to_vector(x_use[1]));
  for (ob in 1:nobs){
    vector[2] utils;
    int id = ids[ob];
    vector[n_par] b = beta[1:n_par,id];
    
    
    utils[1] = b[n_par_h+1]*habit_stock + b[n_par_h+2]*habit_stock^2 + b[n_par_h+3] + dot_product(b[(n_par_h+5):(n_par)],to_vector(x_use[ob]));
    
    utils[2] = b[(n_par_h+4)] + utils[1];
    
    y_out[ob,1] ~ bernoulli_logit(utils[1]);
    y_out[ob,2] ~ bernoulli_logit(utils[2]);
    
    if(t_0[ob]==1){
      habit_stock = b[1]*dot_product(b[2:(k_use+1)],to_vector(x_use[1]));
    } else{
      habit_stock = habit_stock * b[1] + dot_product(b[2:(k_use+1)],to_vector(x_use[ob]));
    }
  }
  
  
}

I actually just meant sharing the data block of your code, but a quick glance at your full Stan model wasn’t enough for me to spot the problem. Maybe @spinkney’s discerning eye can do better.

If you’re able to provide your full data, or any dataset that reproduces the problem, that could help lubricate things.

Also, if you can provide the details of your Stan version that might help.

I actually just found another post discussing a similar problem. It seems that my dimension is too big to use the corr_matrix method. So I am tried the solution in their post, using the ‘lkj_corr_cholesky’ instead ‘lkj_corr’.
There is no longer the non positive definite problem poping up…
However, it shows a different error.


Initial log joint probability = -3.63009e+12
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
1 -3.63009e+12 0 1.49699e+13 0.001 0.001 12
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.

I am sharing the modified code below. I suspect I should also change the way I generate the Sigma if I am using the ‘multi_normal_cholesky’ for beta priors.

Could you help take another look please?

data{
  int<lower=1> nobs;
  int<lower=1> N;
  int<lower=1> n_out;
  array[nobs] int<lower=1> ids;
  
  int<lower=1> k_use;
  array[nobs,n_out] int<lower=0,upper=1> y_out;
  
  array[nobs,k_use] int<lower=0,upper=1> x_use;
  array[nobs] int<lower=1> t_0;
}

transformed data{
  int<lower=1> n_par_h = 1+n_out; //how does different activity contribute to habit stock = 3
  int<lower=1> n_par_r = n_out + 2 + k_use; //9;
  int<lower=1> n_par = n_par_h + n_par_r;
}

parameters{
  matrix[n_par,N] beta;
  
  cholesky_factor_corr[n_par] Omega;
  
  vector[n_par] mu;
  vector<lower=0>[n_par] tau;
}

model{
  matrix[n_par,n_par]Sigma;
  Omega ~ lkj_corr_cholesky(1);
  tau ~ cauchy(0,2.5);
  
  Sigma = quad_form_diag(Omega,tau);
  
  mu ~ normal(0,5);
  
  for (i in 1:N){
    beta[,i] ~ multi_normal_cholesky(mu,Sigma);
  }
  
  real habit_stock = beta[1,1]*dot_product(beta[2:(k_use+1),1],to_vector(x_use[1]));
  for (ob in 1:nobs){
    vector[2] utils;
    int id = ids[ob];
    vector[n_par] b = beta[1:n_par,id];
    
    
    utils[1] = b[n_par_h+1]*habit_stock + b[n_par_h+2]*habit_stock^2 + b[n_par_h+3] + dot_product(b[(n_par_h+5):(n_par)],to_vector(x_use[ob]));
    
    utils[2] = b[(n_par_h+4)] + utils[1];
    
    y_out[ob,1] ~ bernoulli_logit(utils[1]);
    y_out[ob,2] ~ bernoulli_logit(utils[2]);
    
    if(t_0[ob]==1){
      habit_stock = b[1]*dot_product(b[2:(k_use+1)],to_vector(x_use[1]));
    } else{
      habit_stock = habit_stock * b[1] + dot_product(b[2:(k_use+1)],to_vector(x_use[ob]));
    }
  }
  
  
}