Initialization between (-2, 2) failed after 100 attempts

dmp · May 22, 2023, 8:38am

Hi I am new to Stan. I am trying to estimate a model and to start, I run ‘model0$optimize(data = data_m0,algorithm = ‘lbfgs’)’. However, it returns the following error message:

Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: lkj_corr_lpdf: Correlation matrix is not positive definite. (in ‘/tmp/RtmpU01otz/model-2e799e29da4e1b.stan’, line 43, column 2 to column 22)
Rejecting initial value:
…
Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model. "

How should I fix the intial values? Would really appreciate any help!! Thanks

model{
  matrix[n_par,n_par] Sigma;
  //vector[N] delta; //individual_habit derperacion rate
  
  Omega ~ lkj_corr(2);
  tau ~ cauchy(0,2.5);
  
  Sigma = quad_form_diag(Omega,tau);
  //equivalent to diag(tau)*Omega*diag(tau)
  
  mu ~ normal(0,5);
  
  for (i in 1:N){
    beta[,i] ~ multi_normal(mu,Sigma);
  }
  
  for (ob in 1:nobs){
    real habit_stock;
    vector[2] utils;
    int id = ids[ob];
    vector[n_par] b = beta[1:n_par,id];
    
    if (t_0[ob]==1){
      habit_stock = b[1];
    } 
    
    utils[1] = b[n_par_h+1]*habit_stock + b[n_par_h+2]*habit_stock^2 + b[n_par_h+3] + dot_product(b[(n_par_h+5):(n_par_h+4+k_use)],to_vector(x_use[ob])) + dot_product(b[(n_par_h+k_use+5):(n_par_h+k_use+4+k_context)],to_vector(x_context[ob])) + dot_product(b[(n_par_h+k_use+5+k_context):n_par],to_vector(x_indi[id,]));
    y_out[ob,1] ~ bernoulli_logit(utils[1]);
    

    utils[2] = b[(n_par_h+4)];
    y_out[ob,2] ~ bernoulli_logit(utils[2]+utils[1]);
    
    
    habit_stock = habit_stock*b[1]+dot_product(b[2:n_par_h],to_vector(x_use[ob]));
  }
  
  }

jsocolar · May 22, 2023, 2:20pm

To address this question, we need to see the entire Stan file, including the data and especially the parameters blocks, as well as transformed data and transformed parameters if those exist.

dmp · May 22, 2023, 2:29pm

hi here is the full code.

Regading the data, should I share the whole data/a small sample or a snapshot would be enough?

data{
  int<lower=1> nobs;
  int<lower=1> N;
  int<lower=1> n_out;
  array[nobs] int<lower=1> ids;
  
  int<lower=1> k_use;
  array[nobs,n_out] int<lower=0,upper=1> y_out;
  
  array[nobs,k_use] int<lower=0,upper=1> x_use;
  array[nobs] int<lower=1> t_0;
}

transformed data{
  int<lower=1> n_par_h = 1+n_out; //how does different activity contribute to habit stock = 3
  int<lower=1> n_par_r = n_out + 2 + k_use; //9;
  int<lower=1> n_par = n_par_h + n_par_r;
}

parameters{
  matrix[n_par,N] beta;
  
  corr_matrix[n_par] Omega;
  
  vector[n_par] mu;
  vector<lower=0>[n_par] tau;
}

model{
  matrix[n_par,n_par]Sigma;
  Omega ~ lkj_corr(2);
  tau ~ cauchy(0,2.5);
  
  Sigma = quad_form_diag(Omega,tau);
  
  mu ~ normal(0,5);
  
  for (i in 1:N){
    beta[,i] ~ multi_normal(mu,Sigma);
  }
  
  real habit_stock = beta[1,1]*dot_product(beta[2:(k_use+1),1],to_vector(x_use[1]));
  for (ob in 1:nobs){
    vector[2] utils;
    int id = ids[ob];
    vector[n_par] b = beta[1:n_par,id];
    
    
    utils[1] = b[n_par_h+1]*habit_stock + b[n_par_h+2]*habit_stock^2 + b[n_par_h+3] + dot_product(b[(n_par_h+5):(n_par)],to_vector(x_use[ob]));
    
    utils[2] = b[(n_par_h+4)] + utils[1];
    
    y_out[ob,1] ~ bernoulli_logit(utils[1]);
    y_out[ob,2] ~ bernoulli_logit(utils[2]);
    
    if(t_0[ob]==1){
      habit_stock = b[1]*dot_product(b[2:(k_use+1)],to_vector(x_use[1]));
    } else{
      habit_stock = habit_stock * b[1] + dot_product(b[2:(k_use+1)],to_vector(x_use[ob]));
    }
  }
  
  
}

jsocolar · May 22, 2023, 2:46pm

I actually just meant sharing the data block of your code, but a quick glance at your full Stan model wasn’t enough for me to spot the problem. Maybe @spinkney’s discerning eye can do better.

If you’re able to provide your full data, or any dataset that reproduces the problem, that could help lubricate things.

Also, if you can provide the details of your Stan version that might help.

dmp · May 22, 2023, 3:34pm

I actually just found another post discussing a similar problem. It seems that my dimension is too big to use the corr_matrix method. So I am tried the solution in their post, using the ‘lkj_corr_cholesky’ instead ‘lkj_corr’.
There is no longer the non positive definite problem poping up…
However, it shows a different error.

Initial log joint probability = -3.63009e+12
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
1 -3.63009e+12 0 1.49699e+13 0.001 0.001 12
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.

I am sharing the modified code below. I suspect I should also change the way I generate the Sigma if I am using the ‘multi_normal_cholesky’ for beta priors.

Could you help take another look please?

data{
  int<lower=1> nobs;
  int<lower=1> N;
  int<lower=1> n_out;
  array[nobs] int<lower=1> ids;
  
  int<lower=1> k_use;
  array[nobs,n_out] int<lower=0,upper=1> y_out;
  
  array[nobs,k_use] int<lower=0,upper=1> x_use;
  array[nobs] int<lower=1> t_0;
}

transformed data{
  int<lower=1> n_par_h = 1+n_out; //how does different activity contribute to habit stock = 3
  int<lower=1> n_par_r = n_out + 2 + k_use; //9;
  int<lower=1> n_par = n_par_h + n_par_r;
}

parameters{
  matrix[n_par,N] beta;
  
  cholesky_factor_corr[n_par] Omega;
  
  vector[n_par] mu;
  vector<lower=0>[n_par] tau;
}

model{
  matrix[n_par,n_par]Sigma;
  Omega ~ lkj_corr_cholesky(1);
  tau ~ cauchy(0,2.5);
  
  Sigma = quad_form_diag(Omega,tau);
  
  mu ~ normal(0,5);
  
  for (i in 1:N){
    beta[,i] ~ multi_normal_cholesky(mu,Sigma);
  }
  
  real habit_stock = beta[1,1]*dot_product(beta[2:(k_use+1),1],to_vector(x_use[1]));
  for (ob in 1:nobs){
    vector[2] utils;
    int id = ids[ob];
    vector[n_par] b = beta[1:n_par,id];
    
    
    utils[1] = b[n_par_h+1]*habit_stock + b[n_par_h+2]*habit_stock^2 + b[n_par_h+3] + dot_product(b[(n_par_h+5):(n_par)],to_vector(x_use[ob]));
    
    utils[2] = b[(n_par_h+4)] + utils[1];
    
    y_out[ob,1] ~ bernoulli_logit(utils[1]);
    y_out[ob,2] ~ bernoulli_logit(utils[2]);
    
    if(t_0[ob]==1){
      habit_stock = b[1]*dot_product(b[2:(k_use+1)],to_vector(x_use[1]));
    } else{
      habit_stock = habit_stock * b[1] + dot_product(b[2:(k_use+1)],to_vector(x_use[ob]));
    }
  }
  
  
}

Topic		Replies	Views
Debug help Modeling	2	794	November 29, 2017
Error message: Initialization between (-2, 2) failed after 100 attempt Modeling rstan , techniques , fitting-issues	1	725	June 10, 2021
Initialization between (-2, 2) failed after 100 attempts in Bass model fitting Modeling	3	687	March 14, 2020
Initialization between (-2, 2) failed after 100 attempts Modeling fitting-issues	3	585	January 30, 2021
Initialization between (-2, 2) failed after 100 attempts Modeling	4	5883	April 24, 2018

Initialization between (-2, 2) failed after 100 attempts

Initial log joint probability = -3.63009e+12 Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes 1 -3.63009e+12 0 1.49699e+13 0.001 0.001 12 Error evaluating model log probability: Non-finite gradient. Error evaluating model log probability: Non-finite gradient.

Related topics

Initial log joint probability = -3.63009e+12
Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
1 -3.63009e+12 0 1.49699e+13 0.001 0.001 12
Error evaluating model log probability: Non-finite gradient.
Error evaluating model log probability: Non-finite gradient.