'Rejecting Initial Value' and 'Error Evaluating the Log Probability

amohan · July 15, 2021, 5:13pm

Hello All

I am new to Stan and so my apologies if I missed something while posting the problem, where any help will be much appreciated. I am getting the error below, while trying to run STAN

Based on a similar issue posted in a different thread from a few years ago, it mentioned that having a lower bound will help solve the issue. However, a lower bound is included in the code below.

Copy of Error

SAMPLING FOR MODEL ‘bppca’ NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: multi_normal_lpdf: LDLT_Factor of covariance parameter is not positive definite. last conditional variance is -21.8925. (in ‘modele0858283940_bppca’ at line 55)

****
Code Where Stan is Being Called
****
Nsamples = 1000
Nchains = 3

model_file = "stan_code/bppca.stan"
smod = stan_model(model_file)

D_max = 5
sa.list = list()
log_lik.list = list()
looic.list = list()
for(d in 1:D_max){
	print(d)
	pca_data <- list(Y = Y, N = N, P = P, D = d, Fam =  Fam, Site = Site, ns = ns, nf = nf)
	set.seed(314)
	sa.list[[d]] = sampling(smod, data= pca_data, iter=Nsamples, chains=Nchains,init="random")
	#log_lik.list[[d]] <- extract_log_lik(sa.list[[d]])
	log_lik.list[[d]] <- extract(sa.list[[d]],"log_lik_marg")[[1]]
	looic.list[[d]] = loo(log_lik.list[[d]])
	save(sa.list,log_lik.list,looic.list,file="results/bppca_results.RData")
	print("###############################")
}

Stan Model File

data{
  int<lower=1> N; 	// number of subjects in dataset
  int<lower=1> P; 	// dimension of data
  int<lower=1> D; 	// dimension of latent space
  int<lower=1> ns;	// number of sites
  int<lower=1> nf;	// number of families with more than one member
  int<lower=1> Site[N]; // site ids
  int<lower=0> Fam[N];	// family ids
  vector[P] Y[N]; 	// responses
}
transformed data{
  vector[P] eta;
  eta = rep_vector(0,P);
}
parameters{
  vector[D] Theta[N];
  matrix[P,D] Lambda;
  matrix[D,ns] A_site;
  matrix[D,nf] B_fam;
  matrix[P,ns] C_site;
  matrix[P,nf] D_fam;
  real<lower=0> sigma2_a;
  real<lower=0> sigma2_b;
  real<lower=0> sigma_c;
  real<lower=0> sigma_d;
  real<lower=0> sigma_eps;
}
transformed parameters{
  vector[D] mu[N];
  vector[P] nu[N];
  for(n in 1:N){
    if(Fam[n] == 0) mu[n] = col(A_site,Site[n]);
    if(Fam[n]>0) mu[n] = col(A_site,Site[n]) + col(B_fam,Fam[n]);
  }
  for(n in 1:N){
    if(Fam[n] == 0) nu[n] = col(C_site,Site[n]);
    if(Fam[n]>0) nu[n] = col(C_site,Site[n]) + col(D_fam,Fam[n]);
  }
}
model{
  sigma2_a ~ uniform(0,1);
  sigma2_b ~ uniform(0,1-sigma2_a);
  sigma_c ~ cauchy(0,1);
  sigma_d ~ cauchy(0,1);
  sigma_eps ~ cauchy(sigma_c^2+sigma_d^2,1);
  to_vector(A_site) ~ normal(0,sqrt(sigma2_a));
  to_vector(B_fam) ~ normal(0,sqrt(sigma2_b));
  to_vector(C_site) ~ normal(0,sigma_c);
  to_vector(D_fam) ~ normal(0,sigma_d);
  for(d in 1:D)
  Lambda[,d] ~ normal(0, 1);
  for(n in 1:N){
    if(Fam[n] == 0){
      Theta[n] ~ multi_normal(mu[n], diag_matrix(rep_vector(1-sigma2_a,D)));
      Y[n] ~ multi_normal(Lambda * Theta[n] + nu[n], diag_matrix(rep_vector(sigma_eps^2-sigma_c^2,P)));
    }
    if(Fam[n] > 0){
      Theta[n] ~ multi_normal(mu[n], diag_matrix(rep_vector(1-sigma2_a-sigma2_b,D)));
      Y[n] ~ multi_normal(Lambda * Theta[n] + nu[n], diag_matrix(rep_vector(sigma_eps^2-sigma_c^2-sigma_d^2,P)));
    }
  }
}
generated quantities{
  //matrix[P,P] Eig_vec_lambda;
  //vector[P] Eig_val_lambda;
  //matrix[P,P] W;
  cov_matrix[P] Q;
  vector[N] log_lik_marg;
  vector[N] log_lik;
  //Eig_vec_lambda = eigenvectors_sym(Lambda*Lambda');
  //Eig_val_lambda = eigenvalues_sym(Lambda*Lambda');
  Q = Lambda * Lambda' + diag_matrix(rep_vector(sigma_eps^2,P));
  //W = Lambda * Lambda';
  for (n in 1:N){
    log_lik_marg[n] = multi_normal_lpdf(Y[n] | eta, Q);
    if(Fam[n] == 0){
      log_lik[n] = multi_normal_lpdf(Y[n] | Lambda * Theta[n] + nu[n], diag_matrix(rep_vector(sigma_eps^2-sigma_c^2,P)));
    }
    if(Fam[n] > 0){
      log_lik[n] = multi_normal_lpdf(Y[n] | Lambda * Theta[n] + nu[n], diag_matrix(rep_vector(sigma_eps^2-sigma_c^2-sigma_d^2,P)));
    }
  }
}

PS: This is a publicly available code and I am trying to recreate the results of a paper

EDIT: @maxbiostat edited this post for syntax highlighting.

jsocolar · July 15, 2021, 5:28pm

Welcome to discourse!

It looks like you need to constrain sigma_eps to be larger than sigma_c + sigma_d, and this is the source of the error you’re seeing. However, it additionally appears that sigma_eps and sigma_c are not really identified in this model; only their difference appears in the likelihood, and those Cauchy priors are very weak.

Often, a useful trick for estimating different variance components is to parameterize the model using the total variance and a simplex vector that multiplicatively partitions the total variance into its constitutive components. Again, because sigma_eps and sigma_c don’t appear independently in the likelihood, the simplex will be very weakly identified, but crucially that lack of identification will be orthogonal to the total variance, which should enable efficient sampling.

amohan · July 15, 2021, 5:47pm

thank you for the warm welcome. how will the constraint you are suggesting be specified in the file?

jsocolar · July 15, 2021, 6:27pm

I don’t understand enough about your model to give definitive advice. Now that @maxbiostat got the code nicely formatted I do see that sigma_eps seems to be identified separately from sigma_c (I think?), so sorry about any confusion there.

Taking this code at face value, what I would do is reparameterize the model in terms of real <lower=0> sigma_1, which corresponds to sqrt(sigma_eps^2 - sigma_c^2 - sigma_d^2), real <lower=0> sigma_2 which corresponds to sigma_d, and real <lower=0> sigma_3 which corresponds to sigma_c. Then when you need sigma_eps^2 - sigma_c^2, that’s just sigma_1^2 + sigma_2^2, and when you need sigma_eps that’s just the root sum of squares of sigma_1, sigma_2, and sigma_3. You’ll have to work out what reasonable priors over these new parameters might be. But you would need to do this anyway, since your existing prior configuration places quite a bit of prior mass on values of sigma_eps^2 - sigma_c^2 - sigma_d^2 that are less than zero. And I’m pretty confident that it will be easier for you to reason about a good prior model under the reparameterization suggested here.

amohan · July 15, 2021, 7:51pm

thank you for the update. If I understood correctly, I would create a new parameter real<lower=0> sigma_1; and then respecify sigma_eps in terms of sigma_1, sigma_2 (parameter sigma_d in current code) and sigma_3 (parameter sigma_c in current code), and also create new priors for sigma_!? Given i am trying to recreate the results of a prior published paper which is where the code comes from, I was wondering if there is a way to not make any changes to the model even if it may be less efficient?

jsocolar · July 15, 2021, 7:58pm

The priors in the current code are incoherent. Was the previous paper implemented in Stan, or something else? To reproduce exactly, it will be important to understand exactly how the previous paper’s software dealt with the prior mass that sits over negative variances.

Edit: “incoherent” has a strong negative connotation that I did not mean to imply when I wrote this post. You’re not doing anything dumb or incoherent! It’s just that the prior model as specified in the code doesn’t respect constraints that variance parameters must respect, and it is incoherent in a narrow technical sense.

amohan · July 15, 2021, 8:04pm

The paper was implemented in R, with the R code then calling the stan model. If you do think the priors need to be changed based on your suggestion, any help in what I should set the priors to will be appreciated. The complete code for the R and the stan file of the paper that I am trying to reproduce is available here

jsocolar · July 15, 2021, 9:40pm

I was working on a response, but ultimately ended up consulting the original paper and its supplement. For what it’s worth, I simply do not understand how the Stan code in the Github repo corresponds to the model described in the supplement. I don’t have enough time to do a deep enough dive to figure out whether it corresponds or not, but it sure looks to me as though it does not. One thing that you do seem to know is that the code cannot be run as-is because it won’t initialize. And even if you passed allowable inits, it looks very probable to me that this would have all kinds of sampling problems because it doesn’t have any built-in mechanism to respect the positivity constraints that are required of the variances!

Sorry I can’t be of further help. Since it’s a neurocognition paper, maybe @mike-lawrence would have some interest in taking a deeper dive. Mike, if you have time, can you square this Stan model with this supplement from this paper?

amohan · July 15, 2021, 10:29pm

thanks @jsocolar. I appreciate you looking into this thus far and also taking the time to read the paper and the supplement. would you have another suggestion for how I could initialize it differently that i could try out?

mike-lawrence · July 15, 2021, 11:08pm

I wonder if it might make sense to create an issue at the GitHub repo to call on the expertise of the original authors?

amohan · July 16, 2021, 1:27pm

thanks @mike-lawrence. I did write to the original authors. I do think it must be me who is doing something incorrectly, given I am new to Stan hence also want to try to see if there are other ways to run the code without significant change.

amohan · July 19, 2021, 9:24pm

@jsocolar any further thoughts?

jsocolar · July 19, 2021, 9:26pm

My only thought at this point is that either I really don’t understand the model or the model definition file as provided from that paper is buggy. Given that you can’t run it, and given that I really can’t make head or tail of it, I sorta suspect the latter.

Edit: and there’s no sense in thinking too carefully about how to initialize a buggy model.

Topic		Replies	Views
Rejecting initial value: Chain 1: Log probability evaluates to log(0), i.e. negative infinity Modeling	3	7660	September 5, 2020
Error evaluating the log probability at the initial value Modeling	7	894	December 10, 2020
Initialization between (-2, 2) failed after 100 attempts Modeling specification	4	397	May 22, 2023
Rejection Initial value Modeling	5	2046	July 3, 2017
Error rejecting initial value but have looked at lower bounds of priors Modeling rstan , fitting-issues , specification	11	592	May 2, 2020

'Rejecting Initial Value' and 'Error Evaluating the Log Probability

Related Topics