Error rejecting initial value but have looked at lower bounds of priors

I’m receiving the following error:

Chain 4: Rejecting initial value:
Chain 4:   Error evaluating the log probability at the initial value.
Chain 4: Exception: validate transformed params: psi[8742] is -nan, but must be greater than or equal to 0  (in 'model3a6550c0a439_thresholds' at line 30)

Chain 4: 
Chain 4: Initialization between (-2, 2) failed after 100 attempts. 
Chain 4:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
error occurred during calling the sampler; sampling not done
[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."

I receive the error for all four chains. I did find this post "https://discourse.mc-stan.org/t/error-evaluating-the-log-probability-at-the-initial-value/5254," and I checked my priors and all have a lower bound at zero. It’s possible that I need to set the prior on fsA, but then it would be a matrix prior, and I’m uncertain what I should set that prior to be.

R version is 3.6.0. I’m running Stan through rStan 2.19.3. StanHeaders is 2.19.2.

This is my Stan code:

data{  // Everything that must be input by the user
  int<lower=4> n_total;                         // Total number of trials to analyze
  int<lower=2> n_subjects;                      // Number of unique observers
  int<lower=2> n_levels;                        // Number of levels of Factor 
  real intensity[n_total];                      // Intensity of the stimulus on each trial
  int<lower=0> subject[n_total];                // Observer on each trial
  int<lower=0> level[n_total];                  // Level of Factor on each trial
  int<lower=0,upper=1> correct[n_total];        // Whether the response was correct (1) on each trial
  real<lower=0,upper=1> chance_performance;     // Chance performance for experiment (e.g., 1/n for n-AFC)
}
transformed data{
  int df_levels_sA;       // Number of degrees of freedom in the interaction 
  int n_levels_sA;          // Number of levels in the interaction 
  
  df_levels_sA = (n_levels - 1) * (n_subjects - 1);
  n_levels_sA = n_levels * n_subjects;
}
parameters{  
  vector<lower=0,upper=1>[n_subjects] lapse;        // Observer's lapse rate
  real mum;
  real muw;
  vector[n_levels-1] fA;
  vector[n_subjects-1] sA;
  vector[n_subjects-1] sB;
  matrix[n_subjects-1,n_levels-1] fsA;
}
transformed parameters {
real<lower=0,upper=1> mean_beta;              // Mean of beta prior for individual lapse rate
real<lower=0> betaEta;                       // Precision parameter of beta prior for individual lapse rate
vector<lower=0,upper=1>[n_total] psi;
real<lower=0> lapse_alpha;
real<lower=0> lapse_beta;

real m[n_subjects,n_levels];
real<lower=0> w[n_subjects,n_levels];
real threshold[n_total];
real<lower=0> width[n_total];

vector[n_levels] factor_alpha;
vector[n_subjects] subject_alpha;
vector[n_subjects] subject_beta;
matrix[n_subjects,n_levels] interaction_alpha;

for (l in 1:(n_levels-1)) { 
factor_alpha[l] = fA[l];
}
factor_alpha[n_levels] = -1*sum(fA)/(n_levels-1);

for (l in 1:(n_subjects-1)) { 
subject_alpha[l] = sA[l];
subject_beta[l] = sB[l];
}
subject_alpha[n_subjects] = -1*sum(sA)/(n_subjects - 1);
subject_beta[n_subjects] = -1*sum(sB)/(n_subjects - 1);

for(sj in 1:(n_subjects - 1)) { 
for(l in 1:(n_levels - 1)) { 
interaction_alpha[sj,l] = fsA[sj,l];
}
interaction_alpha[sj,n_levels] = -1 * sum(fsA[sj,]);
}
for (l in 1:n_levels-1) { 
interaction_alpha[n_subjects,l] = -1 * sum(fsA[,l]);
}


for (sj in 1:n_subjects) {
for (l in 1:n_levels) {
m[sj,l] = mum + factor_alpha[l] + subject_alpha[sj] + interaction_alpha[sj,l];
w[sj,l] = muw + subject_beta[sj];
}
}

for (tr in 1:n_total) {
threshold[tr] = m[subject[tr],level[tr]];
width[tr] = w[subject[tr],level[tr]];
psi[tr] = (1-lapse[ subject[tr] ])*( (1-chance_performance)*inv_logit(4.4/width[tr] * (intensity[tr]-threshold[tr])) + chance_performance) + chance_performance*lapse[ subject[tr] ];
}

mean_beta  = 0.01;
betaEta = 100;
lapse_alpha = mean_beta * betaEta;
lapse_beta = (1-mean_beta) * betaEta ;
}
model {
//mean_beta ~ beta(1,60);
//betaEta ~ gamma(1,0.01);
fA ~ normal(0,1);
sA ~ normal(0,1);
sB ~ normal(0,1);

muw ~ gamma(2,.5);
lapse ~ beta(lapse_alpha,lapse_beta);
correct ~ bernoulli(psi);
}

This is the code I use from R:

dat <- ace.threshold.t2
dat <- subset(dat, !is.na(norm))
dat$condition <- factor(dat$condition)
dat$pid <- factor(dat$pid)

nTotal <- dim(dat)[1]
nCond <- length(unique(dat$condition))
nSubj <- length(unique(dat$pid))
intensity <- dat$norm
condition <- as.numeric(dat$condition)
pid <- as.numeric(dat$pid)
correct <- dat$correct_button == "correct"
chancePerformance <- 1/2
stanData <- list(n_total=nTotal, n_levels=nCond, n_subjects = nSubj, subject = pid, intensity=intensity, level=condition, correct=correct, chance_performance=chancePerformance)

fit.norm.diff <- stan(file="thresholds.stan", data=stanData, cores = 4)

Thank you very much!

James

Here is a link to a snippet of the data:

This usually means that you need to make the initial values be less dispersed in order to avoid numerical problems. You can do that by passing init_r = 0.5 or so when you call stan.

Hmmm. I tried .5, then .1. I got the same error. Any other ideas?

I would follow this line in the loop with

if (is_nan(psi[tr])) {
  print("lapse = ", lapse[subject[tr]]);
  print("width = ", width[tr]);
  ...  
}

until you find what is going NaN and work backwards from that.

Sorry–I’m new to Stan, but it seems that the lapse/width output from running one chain suggests that the same width and lapse parameters are being given to all individuals:

Am I reading this right?

I’m confused…because I believe that the code is stating to calculate a lapse and width for every unique subject.

That looks suspicious but it could be that those are the only values that are resulting in NaN. You should also print out intensity[tr], threshold[tr], and anything else that goes into psi[tr].

Thanks for the super quick response. I just added threshold to the mix and it’s coming back NaN. So it’s likely something to do with what’s going on in here

for(sj in 1:(n_subjects - 1)) { 
for(l in 1:(n_levels - 1)) { 
interaction_alpha[sj,l] = fsA[sj,l];
}
interaction_alpha[sj,n_levels] = -1 * sum(fsA[sj,]);
}
for (l in 1:n_levels-1) { 
interaction_alpha[n_subjects,l] = -1 * sum(fsA[,l]);
}

for (sj in 1:n_subjects) {
for (l in 1:n_levels) {
m[sj,l] = mum + factor_alpha[l] + subject_alpha[sj] + interaction_alpha[sj,l];
w[sj,l] = muw + subject_beta[sj];
}
}

Is there anything conspicuous that you can see that isn’t right? I’ll go over it again. What’s the best way to trouble shoot in Stan? In R, if I have various functions, I can generally just break down the function with actual data to see what’s going on. It doesn’t seem so straightforward with Stan.

Alright, so I was able to get the code to work (after a bit of an overhaul). Then I decided to add several more parameters to try to estimate slope (width) of a psychometric curve (before it was only one slope per subject; now I’m trying for one slope per task per subject…same as threshold.

I’m getting the same error, but tried changing init again to somewhere in the region of .5. I’ve also inserted the

{
print("lapse = ", lapse[subject[tr]]);
print("width = ", width[tr]);
print("threshold = ", threshold[tr]);
print("intensity = ", intensity[tr])
}

but don’t receive anything back. Are there any additional parameters for which I’d need to add the lower bound <lower=0>? I’ve tried various combinations on fB and fsB to no avail.

Here’s the code: (if you take out the factor_beta[l] and interaction_beta[sj, l] from the w[sj,l] = muw + factor_beta[l] + subject_beta[sj] + interaction_beta[sj, l]; line, it will run, but otherwise I’ll get the same error (I even tried setting init to zero). Thanks much!

SAMPLING FOR MODEL 'thresholds' NOW (CHAIN 1).
Chain 1: Rejecting initial value:
Chain 1:   Error evaluating the log probability at the initial value.
Chain 1: Exception: validate transformed params: w[i_0__][i_1__] is -12, but must be greater than or equal to 0  (in 'model440c2360a164_thresholds' at line 42)

[1] "Error in sampler$call_sampler(args_list[[i]]) : Initialization failed."
error occurred during calling the sampler; sampling not done
data{  // Everything that must be input by the user
  int<lower=4> n_total;                         // Total number of trials to analyze
  int<lower=2> n_subjects;                      // Number of unique observers
  int<lower=2> n_levels;                        // Number of levels of Factor 
  real intensity[n_total];                      // Intensity of the stimulus on each trial
  int<lower=0> subject[n_total];                // Observer on each trial
  int<lower=0> level[n_total];                  // Level of Factor on each trial
  int<lower=0,upper=1> correct[n_total];        // Whether the response was correct (1) on each trial
  real<lower=0,upper=1> chance_performance;     // Chance performance for experiment (e.g., 1/n for n-AFC)
}
transformed data{
  int df_levels_sA;          // Number of degrees of freedom in the interaction 
  int n_levels_sA;          // Number of levels in the interaction 
  int df_levels_sB;          // Number of degrees of freedom in the interaction 
  int n_levels_sB; 
  
  df_levels_sA = (n_levels - 1) * (n_subjects - 1);
  n_levels_sA = n_levels * n_subjects;
  
  df_levels_sB = (n_levels - 1) * (n_subjects - 1);
  n_levels_sB = n_levels * n_subjects;
}
parameters{  
  vector<lower=0,upper=1>[n_subjects] lapse;        // Observer's lapse rate
  real mum;
  real muw;
  vector[n_levels-1] fA;
  vector[n_levels-1] fB;
  vector[n_subjects-1] sA;
  vector[n_subjects-1] sB;
  //matrix[n_subjects-1,n_levels-1] fsA;
  vector[(n_subjects - 1) * (n_levels-1)] fsA;
  vector[(n_subjects - 1) * (n_levels-1)] fsB;
}
transformed parameters {
real<lower=0,upper=1> mean_beta;              // Mean of beta prior for individual lapse rate
real<lower=0> betaEta;                       // Precision parameter of beta prior for individual lapse rate
vector<lower=0,upper=1>[n_total] psi;
real<lower=0> lapse_alpha;
real<lower=0> lapse_beta;
real m[n_subjects,n_levels];
real<lower=0> w[n_subjects,n_levels];
real threshold[n_total];
real<lower=0> width[n_total];
vector[n_levels] factor_alpha = append_row(fA, -sum(fA));
vector[n_levels] factor_beta = append_row(fB, -sum(fB));
vector[n_subjects] subject_alpha = append_row(sA, -sum(sA));
vector[n_subjects] subject_beta = append_row(sB, -sum(sB));
matrix[n_subjects,n_levels] interaction_alpha;
matrix[n_subjects,n_levels] interaction_beta;


interaction_alpha = rep_matrix(0, n_subjects, n_levels);
for(sj in 1:(n_subjects - 1)) { 
for(l in 1:(n_levels - 1)) { 
interaction_alpha[sj, l] = fsA[(sj - 1) * (n_levels - 1) + l];
}
interaction_alpha[sj, n_levels] = -1 * sum(interaction_alpha[sj,]);
}
for (l in 1:n_levels) { 
interaction_alpha[n_subjects,l] = -1 * sum(interaction_alpha[,l]);
}

interaction_beta = rep_matrix(0, n_subjects, n_levels);
for(sj in 1:(n_subjects - 1)) { 
for(l in 1:(n_levels - 1)) { 
interaction_beta[sj, l] = fsB[(sj - 1) * (n_levels - 1) + l];
}
interaction_beta[sj, n_levels] = -1 * sum(interaction_beta[sj,]);
}
for (l in 1:n_levels) { 
interaction_beta[n_subjects,l] = -1 * sum(interaction_beta[,l]);
}


for (sj in 1:n_subjects) {
for (l in 1:n_levels) {
m[sj,l] = mum + factor_alpha[l] + subject_alpha[sj] + interaction_alpha[sj, l];
w[sj,l] = muw + factor_beta[l] + subject_beta[sj]  + interaction_beta[sj, l];
}
}

for (tr in 1:n_total) {
threshold[tr] = m[subject[tr],level[tr]];
width[tr] = w[subject[tr],level[tr]];
psi[tr] = chance_performance + (1-lapse[ subject[tr]] - chance_performance) * inv_logit((intensity[tr]-threshold[tr])/width[tr]);
if (is_nan(psi[tr])) {
print("lapse = ", lapse[subject[tr]]);
print("width = ", width[tr]);
print("threshold = ", threshold[tr]);
print("intensity = ", intensity[tr])
}
}

mean_beta  = 0.01;
betaEta = 100;
lapse_alpha = mean_beta * betaEta;
lapse_beta = (1-mean_beta) * betaEta ;
}
model {
//mean_beta ~ beta(1,60);
//betaEta ~ gamma(1,0.01);
fsA ~ normal(0, inv(sqrt(1 - inv(n_levels_sA)))); // Produces a standard normal on on interaction_alpha
fA ~ normal(0, inv(sqrt(1 - inv(n_levels)))); 
sA ~ normal(0, inv(sqrt(1 - inv(n_subjects)))); // Produces a standard normal on on interaction_alpha
sB ~ normal(0, inv(sqrt(1 - inv(n_subjects)))); // Produces a standard normal on on interaction_alpha
fB ~ normal(0, inv(sqrt(1 - inv(n_levels))));
fsB ~ normal(0, inv(sqrt(1 - inv(n_levels_sB))));

muw ~ gamma(2,.5);
lapse ~ beta(lapse_alpha,lapse_beta);
correct ~ bernoulli(psi);
}

w is declared with a lower bound of zero, but I don’t see why that should necessarily be the case based on the code.

Thanks! So I did remove the lower bound of zero from the w and the width parameters, and it ran, so that’s good news. However, since width is the slope of the psychometric curve, it should be positive. Is there a way to address this (I thought that was the purpose of the lower bound)? Otherwise, I’ll get NaNs when using qlogis() to locate threshold at 70%.

I would probably do something like

w[sj, l] = exp(muw + factor_beta[l] + subject_beta[sj] + interaction_beta[sj, l]);

to ensure that it is positive.

If I have four separate timepoints, and I wanted to model timepoints as a 3rd level, would I have to add in separate interactions between subjects and time point; and levels and time point and have matrices for both of those. Or would there be an easier more efficient way of modeling the three way interaction of subject, level, and timepoint?