Rejecting initial value: / Gradient evaluated at the initial value is not finite. / Stan can't start sampling from this initial value

Hi all,
I am encountering an issue when including a random variable as a covariate in a STAN model. I am generating ZZ from a bivariate normal distribution, applying a distance function (dist) in ZZ matrix, and involving it as a covariate in a linear model. I consistently encounter an initialization error. Upon testing, it appears that the problem lies with the random variable, as the function works fine when applied to fixed variables.

Thank you for any assistance.



functions{
  matrix dist(matrix Z, int nrow) {
    matrix[nrow, nrow] dd;
    for(i in 1:nrow){
      for(j in 1:nrow){
      
        dd[i,j]=distance (Z[i,], Z[j,]); //euclidian function     
      }
    }  
       return dd;
  }
}


data{
    int<lower=1> n; // number of observations 
    vector[n] y; // dep_data
    int x_p;
    matrix[n,x_p] x; //covariates
  vector[2] Zero; // a vector of Zeros
}


parameters {
     real<lower=0> sigmasq;
     vector[x_p+1]  beta;//  
       matrix[n,2] ZZ;// random variable
    vector<lower=0>[2] tau; //     
}

transformed parameters{
    matrix[n, n] ZZ1;
    ZZ1 = dist(ZZ, n);
    matrix[n,n] W; // 
    W = dist(ZZ, n);// apply "dist " function
    vector [n] E;
    E=W *x[,2];
    // prepare data
    matrix [n,3] DD;
    DD=append_col(x,E);// prepare dat X-fixed and E random invole ZZ 
}

model{
    for (i in 1:n){
       ZZ[i] ~ normal(0, tau);
     }
   
   vector[n] mu;
   mu = DD*beta;
   
    //specify priors;
    sigmasq ~ student_t(4,0,5)T[0, ];//
    tau ~  student_t(4,0,5)T[0, ];
      for (r in 1:x_p){
  beta[r] ~ normal(0, 10); //regression parameters
  }
  
 //limear model
    for(i in 1:n) y[i] ~ normal(mu[i],sigmasq);  
}

Sample data in R

n<-20
x_p<- ncol(x)
ylag<-rnorm(n,0,1)
y<-rnorm(n,0,1)
x<- cbind(rep(1, n),ylag)

fit ← stan(file=‘stancode.stan’,
data = list(n=n, n1=n1, x_p=x_p, y=y,x=x,Zero=rep(0, 2)), chains = 2, iter = 3000, warmup = 100)


#Here’s what the error code reads:

Chain 1: Rejecting initial value:
Chain 1: Gradient evaluated at the initial value is not finite.
Chain 1: Stan can’t start sampling from this initial value.
Chain 1:
Chain 1: Initialization between (-2, 2) failed after 100 attempts.
Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] “Error : Initialization failed.”
[1] “error occurred during calling the sampler; sampling not done”

Looks like the problem is that distance() function is finicky. Technically the function is not differentiable at zero so the gradient computation fails. As a workaround you can special-case the diagonal in your dist() function. (Also the matrix is symmetric, you don’t need to compute the off-diagonal distances twice)

functions{
  matrix dist(matrix Z, int nrow) {
    matrix[nrow, nrow] dd;
    for(i in 1:nrow){
      for(j in 1:i-1){
        dd[i,j]=distance (Z[i,], Z[j,]); //euclidian function     
        dd[j,i]=dd[i,j];
      }
      dd[i,i] = 0.0;
    }  
       return dd;
  }
}

Thank you for the response, @nhuurre. When you suggest special-casing the diagonal part, do you mean having the diagonal of the distance matrix not equal to 0, regardless of the fact that the Euclidean distance between them will be 0? Also, something unclear is how the function concludes its task if Z is assigned in the data part; this happens only when Z is randomly assigned in STAN.

By special-casing I mean doing something other than using the builtin distance(..) function. You can still set the value to zero, e.g.

if (i == j) {
  dd[i,j] = 0.0;
} else {
  dd[i,j]=distance (Z[i,], Z[j,]);
}

My previous post had example code as well.

Stan needs to compute a derivative with respect to parameters but not data.
In short, derivative tells you how much the output changes when you change the input, and Stan uses the derivative to guide the search for likely parameter values. Stan does not need to change the data.

Thank you, @nhuurre, this solved the issue. But it’s still not very clear how recoding helped here, because previously, despite the fact that the logic of the function was okay, it caused trouble in STAN when running the model.

Hello,

I have a similar issue, although my case is a little different.
I am coding a reinforcement learning model on RStan. I try to model the decision-making process in the human mind during a reward- and effort-based decision-making task.

Below is the R code:

# Create the RL1 folder path for results
RL1_path <- file.path(paths$RL, 'RL1')

# Create the directory if it doesn't exist
if (!dir.exists(RL1_path)) {
  dir.create(RL1_path, recursive = TRUE)
}

# Create an empty data frame to store the fitted param of all the participants
fitParam_all <- data.frame()

# Convert columns to numeric
OptDiff_dff <- OptDiff_df

columns_to_convert <- c("RealEff.opt1", "RealEff.opt2", 
                        "RealEff.opt1.t1", "RealEff.opt2.t1", 
                        "RealEff.opt1.t2", "RealEff.opt2.t2", 
                        "RealEff.opt1.t3", "RealEff.opt2.t3", "SymbL", "SymbR")

OptDiff_dff[columns_to_convert] <- lapply(OptDiff_dff[columns_to_convert], 
                                          as.numeric)

#MY DATA HAS SOME NA VALUES, so I first filled missing values with a sentinel value because "Stan do not support NA values",
OptDiff_dff[is.na(OptDiff_dff)] <- -999

# Loop over unique participants
for (participant_id in unique(OptDiff_dff$ParticipantID)) {

  # Loop over unique sessions
  for (session_name in unique(OptDiff_dff$Session)) {

    # Determine the session folder based on the 'session' variable
    session_folder <- ifelse(tolower(session_name) == 'insidescanner',
                             'SessionIn', 'SessionOut')

    # Create the session file path
    session_path <- file.path(RL1_path, session_folder)

    # Create the directory if it doesn't exist
    if (!dir.exists(session_path)) {
      dir.create(session_path, recursive = TRUE)
    }

    # Get data for the current participant and session
    participant_data <- OptDiff_dff %>%
      dplyr::filter(
        ParticipantID == participant_id & Session == session_name) %>%
      dplyr::select(-ends_with(".V1"))
        
    if(nrow(participant_data) > 0) {
      
      # Create and format the data frame to be able to run the Stan model
      Stan_df <- within(participant_data, {
        Trial <- as.integer(Trial)
        ChosOpt <- as.numeric(ChosOpt) - 1
      })
  
  # Convert the data frame to a list
  Stan_df <- as.list(Stan_df)
  
  # Ensure the names are correctly transformed
  names(Stan_df) <- gsub("\\.", "_", names(Stan_df))
  
  # Convert selected columns to vectors
  vector_columns <- c('EffProb_opt1', 'EffProb_opt2', 'EffMag_opt1',
                      'EffMag_opt2', 'RewMag_opt1', 'RewMag_opt2')
  for (col_name in vector_columns) {
      Stan_df[[col_name]] <- as.vector(Stan_df[[col_name]])
  }
  
  # Add the total number of trials to the list for Stan
  Stan_df$ntr <- as.integer(nrow(participant_data))
  
    # Sampling method: CALLING THE STAN PROGRAM
    fit_samp = rstan::stan(
      file = file.path(paths$stan, 'RL_model1_sampling.stan'),
      data = Stan_df, chains = 1, cores = 4, iter = 1, #FOR NOW (after, set chains=4, iter=6000)
      control = list(adapt_delta = 0.9))

# Whether I run this whole code cell or just the 'Sampling method' snippet, THIS ERROR APPEARS: 
# Chain 1: Rejecting initial value:
# Chain 1:   Gradient evaluated at the initial value is not finite.
# Chain 1:   Stan can't start sampling from this initial value.
# Chain 1: 
# Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
# Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
# [1] "Error : Initialization failed."
# [1] "error occurred during calling the sampler; sampling not done"

#BELOW IS THE REST OF THE CODE THAT NORMALLY WORKS

    # Extract parameters from the fitted Stan model 'fit_samp'
    ParamExtract <- rstan::extract(
      fit_samp, pars = c("invTemp", "LR", "bRewMag", "bEffMag"))
    ParamExtract <- as.list(ParamExtract)
    
    # Convert 'ParamExtract' to a data frame
    ParamExtract <- do.call(cbind.data.frame, ParamExtract)
    fitParam <- as.data.frame(colMeans(ParamExtract))

    # Convert row names to a column
    fitParam <- tibble::rownames_to_column(fitParam, var = "Parameter")
    names(fitParam) <- c("Parameter", "Value")

    # Pivot wider
    fitParam <- tidyr::pivot_wider(
      data = fitParam,
      names_from = Parameter,
      values_from = Value
    )

    # Add 'ParticipantID' column
    fitParam$ParticipantID <- participant_id

    # Select the columns in the desired order
    fitParam <- fitParam %>%
      dplyr::select(ParticipantID, invTemp, LR, bRewMag, bEffMag)
    
    if(nrow(fitParam_all) > 0) {
    # Bind 'fitParam' to the global data frame 'fitParam_all'
    fitParam_all <- rbind(fitParam_all, fitParam)
    } else {
      fitParam_all <- fitParam
    }
    
    # Save the data frame 'fitParam' & 'fitParam_all' to a file in 'RL1_path'
    write.csv(fitParam, file = paste0(
      session_path, "/", participant_id, "_FittedParameters_RLsampl.csv"), 
      row.names = FALSE)
    write.csv(fitParam_all, file = paste0(
      session_path, "/FittedParameters_RLsampl.csv"), row.names = FALSE)
    
    saveRDS(fit_samp, file = paste0(
      session_path, "/", participant_id, "_fitModel.rds"))
    #to read rds file: fit_samp <- readRDS(file = paste0(session_path, "/", participant_id, "_fitModel.rds"))
    
    } else { 
      next 
    }
  }
}

Below is the Stan code:

// Declare the DATA that the model will use
data {
  int ntr; // number of trials
  real EffProb_opt1[ntr];   // z-scored Effort Proba of option1
  real EffProb_opt2[ntr];
  real EffMag_opt1[ntr]; 
  real EffMag_opt2[ntr]; 
  real RewMag_opt1[ntr]; 
  real RewMag_opt2[ntr]; 
  real SymbL[ntr]; // Symbol displayed on the left of the screen
  real SymbR[ntr];
  int ChosOpt[ntr];      // option chosen (decision) - which symbol was chosen
  int Trial[ntr];           // current trial number
}

// Define the PARAMETERS that the model will estimate
parameters {
  real < lower = 0 > invTemp; // beta (inverse temperature = decision noise / stochasticity)
  real < lower = 0, upper = 1 > LR; // alpha (learning rate)
  real bRewMag; // weighting of RewMag
  real bEffMag; // weighting of EffMag
}

// MODEL to be estimated: Specify likelihood of data given param & their priors
  // Model outputs
model {  
  real PredictRew_perOpt[ntr,2]; // store PredictRew for 2 Opt across all trials
  real PredictEff_perOpt[ntr,2]; 
  real PE; // temporary variable to calculate difference btw expected/actual outcome
  real UtilOpt1[ntr]; // store utility of Opt1 (desirability to choose this option) 
  real UtilOpt2[ntr];
  real invTxUtil[ntr]; // adjust utility to account for invTemp
  
  // Model specifies PRIOR distributions for the parameters
  invTemp ~ cauchy(0,10); // invTemp with Cauchy distrib° (mean, scale), broad
  bRewMag ~ cauchy(0,2); // bRewMag with Cauchy distrib°, moderate uncertainty
  bEffMag ~ cauchy(0,2); 

  // LEARNING LOOP: Model updates prediction based on the outcomes of each trial
  for (itr in 1:ntr) { // loop iterates over each trial
    if (Trial[itr] < max(Trial)) { // if not last trial, calculate PE for Reward & Effort
  
    if (Trial[itr] == 1) { // if 1st trial, (re)initialize predicted parameters:
      PredictRew_perOpt[itr,2] = 0; // PredictRewOpt2 = mean z-scored distrib°
      PredictRew_perOpt[itr,1] = 0; 
      PredictEff_perOpt[itr,2] = 0; 
      PredictEff_perOpt[itr,1] = 0;
    } 
    
    // UPDATE believes about option 1 - if option 1 was present
    if (SymbL[itr] == 1 || SymbR[itr] == 1) { // only if symbol on left/right is 1
        print("PERew1: SymbL/R == 1");
        PE = RewMag_opt1[itr] - PredictRew_perOpt[itr,1]; // Prediction error calculation
        PredictRew_perOpt[itr+1,1] = PredictRew_perOpt[itr,1] + LR*PE; // Next trial's predicted reward for option 1 calculation
    } else { // if option 1 wasn't on the screen, don't change the believes
      print("PERew1: SymbL/R != 1");
      PredictRew_perOpt[itr+1,1] = PredictRew_perOpt[itr,1];
    }
   
    if (SymbL[itr] == 1 || SymbR[itr] == 1) { // do the same for Effort
      print("PEEff1: SymbL/R == 1");
      PE = EffMag_opt1[itr] - PredictEff_perOpt[itr,1];
      PredictEff_perOpt[itr+1,1] = PredictEff_perOpt[itr,1] + LR*PE;
    } else {
      print("PEEff1: SymbL/R != 1");
      PredictEff_perOpt[itr+1,1] = PredictEff_perOpt[itr,1];
    }
    
    // Do the same for option 2 
    if (SymbL[itr] == 2 || SymbR[itr] == 2) { 
      print("PERew2: SymbL/R == 2");
      PE = RewMag_opt2[itr] - PredictRew_perOpt[itr,2];
      PredictRew_perOpt[itr+1,2] = PredictRew_perOpt[itr,2] + LR*PE;
    } else { 
      print("PERew2: SymbL/R != 2");
      PredictRew_perOpt[itr+1,2] = PredictRew_perOpt[itr,2];
    }
   
    if (SymbL[itr] == 2 || SymbR[itr] == 2) { 
      print("PEEff2: SymbL/R == 2");
      PE = EffMag_opt2[itr] - PredictEff_perOpt[itr,2];
      PredictEff_perOpt[itr+1,2] = PredictEff_perOpt[itr,2] + LR*PE;
    } else { 
      print("PEEff2: SymbL/R != 2");
      PredictEff_perOpt[itr+1,2] = PredictEff_perOpt[itr,2];
    }
  }
  }
  
  // DECISION LOOP: Loop calculates utility of both opt + ChosOpt for each trial
  // UTILITY CALCULATION
  for (itr in 2:ntr) { 
    if (SymbL[itr] == 1 || SymbR[itr] == 1) { // for option 1
      UtilOpt1[itr] = bRewMag*PredictRew_perOpt[itr,1]
                    + bEffMag*PredictEff_perOpt[itr,1]
                    - EffProb_opt1[itr];
    } else {
      print("SymbL/R != 1");
    }

    if (SymbL[itr] == 2 || SymbR[itr] == 2) { // for option 2
      UtilOpt2[itr] = bRewMag*PredictRew_perOpt[itr,2]
                    + bEffMag*PredictEff_perOpt[itr,2]
                    - EffProb_opt2[itr];
    } else {
      print("SymbL/R != 2");
    }

// ABOVE CODE SEEMS TO RUN but when adding THE FOLLOWING SNIPPET it results in the mentioned error.

      // INVTEMP UTILITY ADJUSTMENT: utility difference btw Opt scaled by each invTemp
      invTxUtil[itr] = invTemp*(UtilOpt1[itr]-UtilOpt2[itr]);
      if (!is_inf(invTxUtil[itr])) {
        print("invTemp = ", invTemp);
        print("UtilOpt1[itr] = ", UtilOpt1[itr]);
        print("UtilOpt2[itr] = ", UtilOpt2[itr]);
        print("invTxUtil[itr] = ", invTxUtil[itr]);
      } else {
      print("invTxUtil[itr] == inf");
      }
  
// BELOW IS THE REST OF THE STAN PROGRAM that I excluded for now, trying to go step by step.

  //     // Linking Utility & Choices: ChosOpt as Bernoulli distrib° with logit of invTxUtil
  //   if (ChosOpt[itr] < 2 && !is_nan(invTxUtil[itr])) { // only if ChosOpt is 0/1 (ChosOpt = 0 means choosing option 1 / ChosOpt = 1 means choosing option 2; have to do that has logit works between 0 and 1)
  //     ChosOpt[itr] ~ bernoulli_logit(invTxUtil[itr]);
  //   } else {
  //       print("ChosOpt = 2/3");
  //   }
  }
}

Below is the output displayed in R:

#[printed things]

Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"

I hope someone can help me with this.

Please note that I am a very beginner: I had never really coded before January 2024 and I have been using Stan for a month now…
So I hope I have not given you too much unnecessary information (or not enough…) and that it is quite clear.

Thank you very much in advance, as I have been stuck on this issue for over a week.