Stan error, variable does not exist; processing stage=data initialization; variable name=N;

I am new to R and also using stan modelling. I keep getting this error that my variable doesn’t exist

// Bayesian logistic regression model in Stan

data {
int<lower=0> N; // Number of observations
int<lower=0, upper=1> y[N]; // Binary outcome variable
real age[N]; // Age variable
int<lower=0, upper=1> sex[N]; // Sex variable
int<lower=0, upper=3> cp[N]; // Chest pain type variable
real trestbps[N]; // Resting blood pressure variable
real chol[N]; // Cholesterol variable
real thalch[N]; // Maximum heart rate achieved variable
}

parameters {
real alpha; // Intercept
real beta_age; // Coefficient for age
real beta_sex; // Coefficient for sex
vector[3] beta_cp; // Coefficients for chest pain type (3 dummy variables)
real beta_trestbps; // Coefficient for resting blood pressure
real beta_chol; // Coefficient for cholesterol
real beta_thalch; // Coefficient for maximum heart rate achieved
}

model {
// Priors
alpha ~ normal(0, 1); // Noninformative prior for intercept
beta_age ~ normal(0, 1); // Noninformative prior for age coefficient
beta_sex ~ normal(0, 1); // Noninformative prior for sex coefficient
beta_cp ~ normal(0, 1); // Noninformative prior for coefficients of chest pain type
beta_trestbps ~ normal(0, 1); // Noninformative prior for resting blood pressure coefficient
beta_chol ~ normal(0, 1); // Noninformative prior for cholesterol coefficient
beta_thalch ~ normal(0, 1); // Noninformative prior for maximum heart rate coefficient

// Likelihood
for (i in 1:N) {
real linpred;
linpred = alpha + beta_age * age[i] + beta_sex * sex[i] + beta_cp[cp[i]] + beta_trestbps * trestbps[i] + beta_chol * chol[i] + beta_thalch * thalch[i];
y[i] ~ bernoulli_logit(linpred);
}
}

and

data_list ← list(
N = nrow(train_data),
y = train_data$binary_num,
age = train_data$age,
sex = train_data$sex,
cp = train_data$cp,
trestbps = train_data$trestbps,
chol = train_data$chol,
thalch = train_data$thalch
)

Compile the Stan model

stan_model ← stan(file = “bayeslrm.stan”)

Anyone that could help me with this, Im really struggling. Thank you!

Welcome. If I understand correctly, you haven’t passed your data object data_list to the call to stan.
What happens if you try: stan_model <- stan(file = "bayeslrm.stan", data = data_list)

I changed the prior for the cp vector in my .stan file and I also tried your suggestions. I realized later that I hadn’t included the dataset.

This is what I have now and I still get an error.

// Bayesian logistic regression model in Stan

data {
int<lower=0> N; // Number of observations
int<lower=0, upper=1> y[N]; // Binary outcome variable
real age[N]; // Age variable
int<lower=0, upper=1> sex[N]; // Sex variable
int<lower=0, upper=3> cp[N]; // Chest pain type variable
real trestbps[N]; // Resting blood pressure variable
real chol[N]; // Cholesterol variable
real thalch[N]; // Maximum heart rate achieved variable
}

parameters {
real alpha; // Intercept
real beta_age; // Coefficient for age
real beta_sex; // Coefficient for sex
vector[3] beta_cp; // Coefficients for chest pain type (3 dummy variables)
real beta_trestbps; // Coefficient for resting blood pressure
real beta_chol; // Coefficient for cholesterol
real beta_thalch; // Coefficient for maximum heart rate achieved
}

model {
// Priors
alpha ~ normal(0, 1); // Noninformative prior for intercept
beta_age ~ normal(0, 1); // Noninformative prior for age coefficient
beta_sex ~ normal(0, 1); // Noninformative prior for sex coefficient
beta_cp ~ multi_normal(rep_vector(0, 3), diag_matrix(rep_vector(1, 3))); // Noninformative multivariate normal prior for coefficients of chest pain type
beta_trestbps ~ normal(0, 1); // Noninformative prior for resting blood pressure coefficient
beta_chol ~ normal(0, 1); // Noninformative prior for cholesterol coefficient
beta_thalch ~ normal(0, 1); // Noninformative prior for maximum heart rate coefficient

// Likelihood
for (i in 1:N) {
real linpred;
linpred = alpha + beta_age * age[i] + beta_sex * sex[i] + beta_cp[cp[i]] + beta_trestbps * trestbps[i] + beta_chol * chol[i] + beta_thalch * thalch[i];
y[i] ~ bernoulli_logit(linpred);
}
}

library(rstan)

train_data ← read.csv(“heart_disease_train.csv”)

Put the data into a list

data_list ← list(
N = nrow(train_data),
y = train_data$binary_num,
age = train_data$age,
sex = train_data$sex,
cp = train_data$cp,
trestbps = train_data$trestbps,
chol = train_data$chol,
thalch = train_data$thalch
)

Compile the Stan model

stan_model ← stan(file = “bayeslrm.stan”, data = data_list)

and the error

SAMPLING FOR MODEL ‘anon_model’ NOW (CHAIN 1).
Chain 1: Unrecoverable error evaluating the log probability at the initial value.
Chain 1: Exception: vector[uni] indexing: accessing element out of range. index 0 out of range; expecting index to be between 1 and 3 (in ‘string’, line 34, column 4 to column 155)
[1] “Error : Exception: vector[uni] indexing: accessing element out of range. index 0 out of range; expecting index to be between 1 and 3 (in ‘string’, line 34, column 4 to column 155)”
[1] “error occurred during calling the sampler; sampling not done”

If I had to guess, I’d say something is going wrong with indexing beta_cp, since the error message says it was expecting an index between 1 and 3 (which is the length of beta_cp), but the provided index (presumably cp[i]) was 0.

Could you verify that you’re able to compile the model (without worrying about the data) with something like blr_model <- stan_model(file = "bayeslrm.stan").

Then you can use the compiled model as follows: blr_result <- sampling(object = blr_model, data = data_list).

If that still throws an error the only thing I can think of is that there’s something wrong with the data.

I think you are right and the problem was with the cp. Cp takes values 0,1,2,3.
Cp in my data was a nonnummeric categorical data, with four categories. So instead I transformed it to numerical having these four values. I transformed the sex variable to 0 and 1 too. the rest of the variables take real values.

I changed my .stan file now and it seemed to work. I got an output.

I rewrote my model :

// Likelihood
for (i in 1:N) {
real linpred;
linpred = alpha + beta_age * age[i] + beta_sex * sex[i] + beta_cp * cp[i] + beta_trestbps * trestbps[i] + beta_chol * chol[i] + beta_thalch * thalch[i];
y[i] ~ bernoulli_logit(linpred);
}
}

I am not 100% sure if the model is right. First time I am learning about logistic regression.

Great that you got the model running!
If you’re seeking more substantive feedback on the underlying math / parameterisation of the model, I think you’re better off opening a new topic with a more suitable title and topic / tags (and perhaps marking this thread as solved).

1 Like

Also, I would recommend checking out the brms and/or rstanarm packages. I think you’ll find quite some resources for fitting logistic regression models using those packages.