'Rejecting Initial Value' and 'Error Evaluating the Log Probability

Hello All

I am new to Stan and so my apologies if I missed something while posting the problem, where any help will be much appreciated. I am getting the error below, while trying to run STAN

Based on a similar issue posted in a different thread from a few years ago, it mentioned that having a lower bound will help solve the issue. However, a lower bound is included in the code below.


Copy of Error


SAMPLING FOR MODEL ‘bppca’ NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: multi_normal_lpdf: LDLT_Factor of covariance parameter is not positive definite. last conditional variance is -21.8925. (in ‘modele0858283940_bppca’ at line 55)

****
Code Where Stan is Being Called
****
Nsamples = 1000
Nchains = 3

model_file = "stan_code/bppca.stan"
smod = stan_model(model_file)

D_max = 5
sa.list = list()
log_lik.list = list()
looic.list = list()
for(d in 1:D_max){
	print(d)
	pca_data <- list(Y = Y, N = N, P = P, D = d, Fam =  Fam, Site = Site, ns = ns, nf = nf)
	set.seed(314)
	sa.list[[d]] = sampling(smod, data= pca_data, iter=Nsamples, chains=Nchains,init="random")
	#log_lik.list[[d]] <- extract_log_lik(sa.list[[d]])
	log_lik.list[[d]] <- extract(sa.list[[d]],"log_lik_marg")[[1]]
	looic.list[[d]] = loo(log_lik.list[[d]])
	save(sa.list,log_lik.list,looic.list,file="results/bppca_results.RData")
	print("###############################")
}

Stan Model File

data{
  int<lower=1> N; 	// number of subjects in dataset
  int<lower=1> P; 	// dimension of data
  int<lower=1> D; 	// dimension of latent space
  int<lower=1> ns;	// number of sites
  int<lower=1> nf;	// number of families with more than one member
  int<lower=1> Site[N]; // site ids
  int<lower=0> Fam[N];	// family ids
  vector[P] Y[N]; 	// responses
}
transformed data{
  vector[P] eta;
  eta = rep_vector(0,P);
}
parameters{
  vector[D] Theta[N];
  matrix[P,D] Lambda;
  matrix[D,ns] A_site;
  matrix[D,nf] B_fam;
  matrix[P,ns] C_site;
  matrix[P,nf] D_fam;
  real<lower=0> sigma2_a;
  real<lower=0> sigma2_b;
  real<lower=0> sigma_c;
  real<lower=0> sigma_d;
  real<lower=0> sigma_eps;
}
transformed parameters{
  vector[D] mu[N];
  vector[P] nu[N];
  for(n in 1:N){
    if(Fam[n] == 0) mu[n] = col(A_site,Site[n]);
    if(Fam[n]>0) mu[n] = col(A_site,Site[n]) + col(B_fam,Fam[n]);
  }
  for(n in 1:N){
    if(Fam[n] == 0) nu[n] = col(C_site,Site[n]);
    if(Fam[n]>0) nu[n] = col(C_site,Site[n]) + col(D_fam,Fam[n]);
  }
}
model{
  sigma2_a ~ uniform(0,1);
  sigma2_b ~ uniform(0,1-sigma2_a);
  sigma_c ~ cauchy(0,1);
  sigma_d ~ cauchy(0,1);
  sigma_eps ~ cauchy(sigma_c^2+sigma_d^2,1);
  to_vector(A_site) ~ normal(0,sqrt(sigma2_a));
  to_vector(B_fam) ~ normal(0,sqrt(sigma2_b));
  to_vector(C_site) ~ normal(0,sigma_c);
  to_vector(D_fam) ~ normal(0,sigma_d);
  for(d in 1:D)
  Lambda[,d] ~ normal(0, 1);
  for(n in 1:N){
    if(Fam[n] == 0){
      Theta[n] ~ multi_normal(mu[n], diag_matrix(rep_vector(1-sigma2_a,D)));
      Y[n] ~ multi_normal(Lambda * Theta[n] + nu[n], diag_matrix(rep_vector(sigma_eps^2-sigma_c^2,P)));
    }
    if(Fam[n] > 0){
      Theta[n] ~ multi_normal(mu[n], diag_matrix(rep_vector(1-sigma2_a-sigma2_b,D)));
      Y[n] ~ multi_normal(Lambda * Theta[n] + nu[n], diag_matrix(rep_vector(sigma_eps^2-sigma_c^2-sigma_d^2,P)));
    }
  }
}
generated quantities{
  //matrix[P,P] Eig_vec_lambda;
  //vector[P] Eig_val_lambda;
  //matrix[P,P] W;
  cov_matrix[P] Q;
  vector[N] log_lik_marg;
  vector[N] log_lik;
  //Eig_vec_lambda = eigenvectors_sym(Lambda*Lambda');
  //Eig_val_lambda = eigenvalues_sym(Lambda*Lambda');
  Q = Lambda * Lambda' + diag_matrix(rep_vector(sigma_eps^2,P));
  //W = Lambda * Lambda';
  for (n in 1:N){
    log_lik_marg[n] = multi_normal_lpdf(Y[n] | eta, Q);
    if(Fam[n] == 0){
      log_lik[n] = multi_normal_lpdf(Y[n] | Lambda * Theta[n] + nu[n], diag_matrix(rep_vector(sigma_eps^2-sigma_c^2,P)));
    }
    if(Fam[n] > 0){
      log_lik[n] = multi_normal_lpdf(Y[n] | Lambda * Theta[n] + nu[n], diag_matrix(rep_vector(sigma_eps^2-sigma_c^2-sigma_d^2,P)));
    }
  }
}

PS: This is a publicly available code and I am trying to recreate the results of a paper

EDIT: @maxbiostat edited this post for syntax highlighting.

1 Like

Welcome to discourse!

It looks like you need to constrain sigma_eps to be larger than sigma_c + sigma_d, and this is the source of the error you’re seeing. However, it additionally appears that sigma_eps and sigma_c are not really identified in this model; only their difference appears in the likelihood, and those Cauchy priors are very weak.

Often, a useful trick for estimating different variance components is to parameterize the model using the total variance and a simplex vector that multiplicatively partitions the total variance into its constitutive components. Again, because sigma_eps and sigma_c don’t appear independently in the likelihood, the simplex will be very weakly identified, but crucially that lack of identification will be orthogonal to the total variance, which should enable efficient sampling.

2 Likes

thank you for the warm welcome. how will the constraint you are suggesting be specified in the file?

1 Like

I don’t understand enough about your model to give definitive advice. Now that @maxbiostat got the code nicely formatted I do see that sigma_eps seems to be identified separately from sigma_c (I think?), so sorry about any confusion there.

Taking this code at face value, what I would do is reparameterize the model in terms of real <lower=0> sigma_1, which corresponds to sqrt(sigma_eps^2 - sigma_c^2 - sigma_d^2), real <lower=0> sigma_2 which corresponds to sigma_d, and real <lower=0> sigma_3 which corresponds to sigma_c. Then when you need sigma_eps^2 - sigma_c^2, that’s just sigma_1^2 + sigma_2^2, and when you need sigma_eps that’s just the root sum of squares of sigma_1, sigma_2, and sigma_3. You’ll have to work out what reasonable priors over these new parameters might be. But you would need to do this anyway, since your existing prior configuration places quite a bit of prior mass on values of sigma_eps^2 - sigma_c^2 - sigma_d^2 that are less than zero. And I’m pretty confident that it will be easier for you to reason about a good prior model under the reparameterization suggested here.

1 Like

thank you for the update. If I understood correctly, I would create a new parameter real<lower=0> sigma_1; and then respecify sigma_eps in terms of sigma_1, sigma_2 (parameter sigma_d in current code) and sigma_3 (parameter sigma_c in current code), and also create new priors for sigma_!? Given i am trying to recreate the results of a prior published paper which is where the code comes from, I was wondering if there is a way to not make any changes to the model even if it may be less efficient?

The priors in the current code are incoherent. Was the previous paper implemented in Stan, or something else? To reproduce exactly, it will be important to understand exactly how the previous paper’s software dealt with the prior mass that sits over negative variances.

Edit: “incoherent” has a strong negative connotation that I did not mean to imply when I wrote this post. You’re not doing anything dumb or incoherent! It’s just that the prior model as specified in the code doesn’t respect constraints that variance parameters must respect, and it is incoherent in a narrow technical sense.

2 Likes

The paper was implemented in R, with the R code then calling the stan model. If you do think the priors need to be changed based on your suggestion, any help in what I should set the priors to will be appreciated. The complete code for the R and the stan file of the paper that I am trying to reproduce is available here

1 Like

I was working on a response, but ultimately ended up consulting the original paper and its supplement. For what it’s worth, I simply do not understand how the Stan code in the Github repo corresponds to the model described in the supplement. I don’t have enough time to do a deep enough dive to figure out whether it corresponds or not, but it sure looks to me as though it does not. One thing that you do seem to know is that the code cannot be run as-is because it won’t initialize. And even if you passed allowable inits, it looks very probable to me that this would have all kinds of sampling problems because it doesn’t have any built-in mechanism to respect the positivity constraints that are required of the variances!

Sorry I can’t be of further help. Since it’s a neurocognition paper, maybe @mike-lawrence would have some interest in taking a deeper dive. Mike, if you have time, can you square this Stan model with this supplement from this paper?

2 Likes

thanks @jsocolar. I appreciate you looking into this thus far and also taking the time to read the paper and the supplement. would you have another suggestion for how I could initialize it differently that i could try out?

I wonder if it might make sense to create an issue at the GitHub repo to call on the expertise of the original authors?

3 Likes

thanks @mike-lawrence. I did write to the original authors. I do think it must be me who is doing something incorrectly, given I am new to Stan hence also want to try to see if there are other ways to run the code without significant change.

@jsocolar any further thoughts?

My only thought at this point is that either I really don’t understand the model or the model definition file as provided from that paper is buggy. Given that you can’t run it, and given that I really can’t make head or tail of it, I sorta suspect the latter.

Edit: and there’s no sense in thinking too carefully about how to initialize a buggy model.