# Rejecting initial value: / Gradient evaluated at the initial value is not finite. / Stan can't start sampling from this initial value

Hi all,
I am encountering an issue when including a random variable as a covariate in a STAN model. I am generating ZZ from a bivariate normal distribution, applying a distance function (dist) in ZZ matrix, and involving it as a covariate in a linear model. I consistently encounter an initialization error. Upon testing, it appears that the problem lies with the random variable, as the function works fine when applied to fixed variables.

Thank you for any assistance.

``````
functions{
matrix dist(matrix Z, int nrow) {
matrix[nrow, nrow] dd;
for(i in 1:nrow){
for(j in 1:nrow){

dd[i,j]=distance (Z[i,], Z[j,]); //euclidian function
}
}
return dd;
}
}

data{
int<lower=1> n; // number of observations
vector[n] y; // dep_data
int x_p;
matrix[n,x_p] x; //covariates
vector[2] Zero; // a vector of Zeros
}

parameters {
real<lower=0> sigmasq;
vector[x_p+1]  beta;//
matrix[n,2] ZZ;// random variable
vector<lower=0>[2] tau; //
}

transformed parameters{
matrix[n, n] ZZ1;
ZZ1 = dist(ZZ, n);
matrix[n,n] W; //
W = dist(ZZ, n);// apply "dist " function
vector [n] E;
E=W *x[,2];
// prepare data
matrix [n,3] DD;
DD=append_col(x,E);// prepare dat X-fixed and E random invole ZZ
}

model{
for (i in 1:n){
ZZ[i] ~ normal(0, tau);
}

vector[n] mu;
mu = DD*beta;

//specify priors;
sigmasq ~ student_t(4,0,5)T[0, ];//
tau ~  student_t(4,0,5)T[0, ];
for (r in 1:x_p){
beta[r] ~ normal(0, 10); //regression parameters
}

//limear model
for(i in 1:n) y[i] ~ normal(mu[i],sigmasq);
}

``````

# Sample data in R

n<-20
x_p<- ncol(x)
ylag<-rnorm(n,0,1)
y<-rnorm(n,0,1)
x<- cbind(rep(1, n),ylag)

fit ← stan(file=‘stancode.stan’,
data = list(n=n, n1=n1, x_p=x_p, y=y,x=x,Zero=rep(0, 2)), chains = 2, iter = 3000, warmup = 100)

#Here’s what the error code reads:

Chain 1: Rejecting initial value:
Chain 1: Gradient evaluated at the initial value is not finite.
Chain 1: Stan can’t start sampling from this initial value.
Chain 1:
Chain 1: Initialization between (-2, 2) failed after 100 attempts.
Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] “Error : Initialization failed.”
[1] “error occurred during calling the sampler; sampling not done”

Looks like the problem is that `distance()` function is finicky. Technically the function is not differentiable at zero so the gradient computation fails. As a workaround you can special-case the diagonal in your `dist()` function. (Also the matrix is symmetric, you don’t need to compute the off-diagonal distances twice)

``````functions{
matrix dist(matrix Z, int nrow) {
matrix[nrow, nrow] dd;
for(i in 1:nrow){
for(j in 1:i-1){
dd[i,j]=distance (Z[i,], Z[j,]); //euclidian function
dd[j,i]=dd[i,j];
}
dd[i,i] = 0.0;
}
return dd;
}
}
``````

Thank you for the response, @nhuurre. When you suggest special-casing the diagonal part, do you mean having the diagonal of the distance matrix not equal to 0, regardless of the fact that the Euclidean distance between them will be 0? Also, something unclear is how the function concludes its task if Z is assigned in the data part; this happens only when Z is randomly assigned in STAN.

By special-casing I mean doing something other than using the builtin `distance(..)` function. You can still set the value to zero, e.g.

``````if (i == j) {
dd[i,j] = 0.0;
} else {
dd[i,j]=distance (Z[i,], Z[j,]);
}
``````

My previous post had example code as well.

Stan needs to compute a derivative with respect to parameters but not data.
In short, derivative tells you how much the output changes when you change the input, and Stan uses the derivative to guide the search for likely parameter values. Stan does not need to change the data.

Thank you, @nhuurre, this solved the issue. But it’s still not very clear how recoding helped here, because previously, despite the fact that the logic of the function was okay, it caused trouble in STAN when running the model.

Hello,

I have a similar issue, although my case is a little different.
I am coding a reinforcement learning model on RStan. I try to model the decision-making process in the human mind during a reward- and effort-based decision-making task.

Below is the R code:

``````# Create the RL1 folder path for results
RL1_path <- file.path(paths\$RL, 'RL1')

# Create the directory if it doesn't exist
if (!dir.exists(RL1_path)) {
dir.create(RL1_path, recursive = TRUE)
}

# Create an empty data frame to store the fitted param of all the participants
fitParam_all <- data.frame()

# Convert columns to numeric
OptDiff_dff <- OptDiff_df

columns_to_convert <- c("RealEff.opt1", "RealEff.opt2",
"RealEff.opt1.t1", "RealEff.opt2.t1",
"RealEff.opt1.t2", "RealEff.opt2.t2",
"RealEff.opt1.t3", "RealEff.opt2.t3", "SymbL", "SymbR")

OptDiff_dff[columns_to_convert] <- lapply(OptDiff_dff[columns_to_convert],
as.numeric)

#MY DATA HAS SOME NA VALUES, so I first filled missing values with a sentinel value because "Stan do not support NA values",
OptDiff_dff[is.na(OptDiff_dff)] <- -999

# Loop over unique participants
for (participant_id in unique(OptDiff_dff\$ParticipantID)) {

# Loop over unique sessions
for (session_name in unique(OptDiff_dff\$Session)) {

# Determine the session folder based on the 'session' variable
session_folder <- ifelse(tolower(session_name) == 'insidescanner',
'SessionIn', 'SessionOut')

# Create the session file path
session_path <- file.path(RL1_path, session_folder)

# Create the directory if it doesn't exist
if (!dir.exists(session_path)) {
dir.create(session_path, recursive = TRUE)
}

# Get data for the current participant and session
participant_data <- OptDiff_dff %>%
dplyr::filter(
ParticipantID == participant_id & Session == session_name) %>%
dplyr::select(-ends_with(".V1"))

if(nrow(participant_data) > 0) {

# Create and format the data frame to be able to run the Stan model
Stan_df <- within(participant_data, {
Trial <- as.integer(Trial)
ChosOpt <- as.numeric(ChosOpt) - 1
})

# Convert the data frame to a list
Stan_df <- as.list(Stan_df)

# Ensure the names are correctly transformed
names(Stan_df) <- gsub("\\.", "_", names(Stan_df))

# Convert selected columns to vectors
vector_columns <- c('EffProb_opt1', 'EffProb_opt2', 'EffMag_opt1',
'EffMag_opt2', 'RewMag_opt1', 'RewMag_opt2')
for (col_name in vector_columns) {
Stan_df[[col_name]] <- as.vector(Stan_df[[col_name]])
}

# Add the total number of trials to the list for Stan
Stan_df\$ntr <- as.integer(nrow(participant_data))

# Sampling method: CALLING THE STAN PROGRAM
fit_samp = rstan::stan(
file = file.path(paths\$stan, 'RL_model1_sampling.stan'),
data = Stan_df, chains = 1, cores = 4, iter = 1, #FOR NOW (after, set chains=4, iter=6000)
control = list(adapt_delta = 0.9))

# Whether I run this whole code cell or just the 'Sampling method' snippet, THIS ERROR APPEARS:
# Chain 1: Rejecting initial value:
# Chain 1:   Gradient evaluated at the initial value is not finite.
# Chain 1:   Stan can't start sampling from this initial value.
# Chain 1:
# Chain 1: Initialization between (-2, 2) failed after 100 attempts.
# Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
# [1] "Error : Initialization failed."
# [1] "error occurred during calling the sampler; sampling not done"

#BELOW IS THE REST OF THE CODE THAT NORMALLY WORKS

# Extract parameters from the fitted Stan model 'fit_samp'
ParamExtract <- rstan::extract(
fit_samp, pars = c("invTemp", "LR", "bRewMag", "bEffMag"))
ParamExtract <- as.list(ParamExtract)

# Convert 'ParamExtract' to a data frame
ParamExtract <- do.call(cbind.data.frame, ParamExtract)
fitParam <- as.data.frame(colMeans(ParamExtract))

# Convert row names to a column
fitParam <- tibble::rownames_to_column(fitParam, var = "Parameter")
names(fitParam) <- c("Parameter", "Value")

# Pivot wider
fitParam <- tidyr::pivot_wider(
data = fitParam,
names_from = Parameter,
values_from = Value
)

# Add 'ParticipantID' column
fitParam\$ParticipantID <- participant_id

# Select the columns in the desired order
fitParam <- fitParam %>%
dplyr::select(ParticipantID, invTemp, LR, bRewMag, bEffMag)

if(nrow(fitParam_all) > 0) {
# Bind 'fitParam' to the global data frame 'fitParam_all'
fitParam_all <- rbind(fitParam_all, fitParam)
} else {
fitParam_all <- fitParam
}

# Save the data frame 'fitParam' & 'fitParam_all' to a file in 'RL1_path'
write.csv(fitParam, file = paste0(
session_path, "/", participant_id, "_FittedParameters_RLsampl.csv"),
row.names = FALSE)
write.csv(fitParam_all, file = paste0(
session_path, "/FittedParameters_RLsampl.csv"), row.names = FALSE)

saveRDS(fit_samp, file = paste0(
session_path, "/", participant_id, "_fitModel.rds"))
#to read rds file: fit_samp <- readRDS(file = paste0(session_path, "/", participant_id, "_fitModel.rds"))

} else {
next
}
}
}
``````

Below is the Stan code:

``````// Declare the DATA that the model will use
data {
int ntr; // number of trials
real EffProb_opt1[ntr];   // z-scored Effort Proba of option1
real EffProb_opt2[ntr];
real EffMag_opt1[ntr];
real EffMag_opt2[ntr];
real RewMag_opt1[ntr];
real RewMag_opt2[ntr];
real SymbL[ntr]; // Symbol displayed on the left of the screen
real SymbR[ntr];
int ChosOpt[ntr];      // option chosen (decision) - which symbol was chosen
int Trial[ntr];           // current trial number
}

// Define the PARAMETERS that the model will estimate
parameters {
real < lower = 0 > invTemp; // beta (inverse temperature = decision noise / stochasticity)
real < lower = 0, upper = 1 > LR; // alpha (learning rate)
real bRewMag; // weighting of RewMag
real bEffMag; // weighting of EffMag
}

// MODEL to be estimated: Specify likelihood of data given param & their priors
// Model outputs
model {
real PredictRew_perOpt[ntr,2]; // store PredictRew for 2 Opt across all trials
real PredictEff_perOpt[ntr,2];
real PE; // temporary variable to calculate difference btw expected/actual outcome
real UtilOpt1[ntr]; // store utility of Opt1 (desirability to choose this option)
real UtilOpt2[ntr];
real invTxUtil[ntr]; // adjust utility to account for invTemp

// Model specifies PRIOR distributions for the parameters
invTemp ~ cauchy(0,10); // invTemp with Cauchy distrib° (mean, scale), broad
bRewMag ~ cauchy(0,2); // bRewMag with Cauchy distrib°, moderate uncertainty
bEffMag ~ cauchy(0,2);

// LEARNING LOOP: Model updates prediction based on the outcomes of each trial
for (itr in 1:ntr) { // loop iterates over each trial
if (Trial[itr] < max(Trial)) { // if not last trial, calculate PE for Reward & Effort

if (Trial[itr] == 1) { // if 1st trial, (re)initialize predicted parameters:
PredictRew_perOpt[itr,2] = 0; // PredictRewOpt2 = mean z-scored distrib°
PredictRew_perOpt[itr,1] = 0;
PredictEff_perOpt[itr,2] = 0;
PredictEff_perOpt[itr,1] = 0;
}

// UPDATE believes about option 1 - if option 1 was present
if (SymbL[itr] == 1 || SymbR[itr] == 1) { // only if symbol on left/right is 1
print("PERew1: SymbL/R == 1");
PE = RewMag_opt1[itr] - PredictRew_perOpt[itr,1]; // Prediction error calculation
PredictRew_perOpt[itr+1,1] = PredictRew_perOpt[itr,1] + LR*PE; // Next trial's predicted reward for option 1 calculation
} else { // if option 1 wasn't on the screen, don't change the believes
print("PERew1: SymbL/R != 1");
PredictRew_perOpt[itr+1,1] = PredictRew_perOpt[itr,1];
}

if (SymbL[itr] == 1 || SymbR[itr] == 1) { // do the same for Effort
print("PEEff1: SymbL/R == 1");
PE = EffMag_opt1[itr] - PredictEff_perOpt[itr,1];
PredictEff_perOpt[itr+1,1] = PredictEff_perOpt[itr,1] + LR*PE;
} else {
print("PEEff1: SymbL/R != 1");
PredictEff_perOpt[itr+1,1] = PredictEff_perOpt[itr,1];
}

// Do the same for option 2
if (SymbL[itr] == 2 || SymbR[itr] == 2) {
print("PERew2: SymbL/R == 2");
PE = RewMag_opt2[itr] - PredictRew_perOpt[itr,2];
PredictRew_perOpt[itr+1,2] = PredictRew_perOpt[itr,2] + LR*PE;
} else {
print("PERew2: SymbL/R != 2");
PredictRew_perOpt[itr+1,2] = PredictRew_perOpt[itr,2];
}

if (SymbL[itr] == 2 || SymbR[itr] == 2) {
print("PEEff2: SymbL/R == 2");
PE = EffMag_opt2[itr] - PredictEff_perOpt[itr,2];
PredictEff_perOpt[itr+1,2] = PredictEff_perOpt[itr,2] + LR*PE;
} else {
print("PEEff2: SymbL/R != 2");
PredictEff_perOpt[itr+1,2] = PredictEff_perOpt[itr,2];
}
}
}

// DECISION LOOP: Loop calculates utility of both opt + ChosOpt for each trial
// UTILITY CALCULATION
for (itr in 2:ntr) {
if (SymbL[itr] == 1 || SymbR[itr] == 1) { // for option 1
UtilOpt1[itr] = bRewMag*PredictRew_perOpt[itr,1]
+ bEffMag*PredictEff_perOpt[itr,1]
- EffProb_opt1[itr];
} else {
print("SymbL/R != 1");
}

if (SymbL[itr] == 2 || SymbR[itr] == 2) { // for option 2
UtilOpt2[itr] = bRewMag*PredictRew_perOpt[itr,2]
+ bEffMag*PredictEff_perOpt[itr,2]
- EffProb_opt2[itr];
} else {
print("SymbL/R != 2");
}

// ABOVE CODE SEEMS TO RUN but when adding THE FOLLOWING SNIPPET it results in the mentioned error.

// INVTEMP UTILITY ADJUSTMENT: utility difference btw Opt scaled by each invTemp
invTxUtil[itr] = invTemp*(UtilOpt1[itr]-UtilOpt2[itr]);
if (!is_inf(invTxUtil[itr])) {
print("invTemp = ", invTemp);
print("UtilOpt1[itr] = ", UtilOpt1[itr]);
print("UtilOpt2[itr] = ", UtilOpt2[itr]);
print("invTxUtil[itr] = ", invTxUtil[itr]);
} else {
print("invTxUtil[itr] == inf");
}

// BELOW IS THE REST OF THE STAN PROGRAM that I excluded for now, trying to go step by step.

//     // Linking Utility & Choices: ChosOpt as Bernoulli distrib° with logit of invTxUtil
//   if (ChosOpt[itr] < 2 && !is_nan(invTxUtil[itr])) { // only if ChosOpt is 0/1 (ChosOpt = 0 means choosing option 1 / ChosOpt = 1 means choosing option 2; have to do that has logit works between 0 and 1)
//     ChosOpt[itr] ~ bernoulli_logit(invTxUtil[itr]);
//   } else {
//       print("ChosOpt = 2/3");
//   }
}
}
``````

Below is the output displayed in R:

``````#[printed things]

Chain 1: Rejecting initial value:
Chain 1:   Gradient evaluated at the initial value is not finite.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1:
Chain 1: Initialization between (-2, 2) failed after 100 attempts.
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
[1] "Error : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"
``````

I hope someone can help me with this.

Please note that I am a very beginner: I had never really coded before January 2024 and I have been using Stan for a month now…
So I hope I have not given you too much unnecessary information (or not enough…) and that it is quite clear.

Thank you very much in advance, as I have been stuck on this issue for over a week.