RStan Error: "Exception: variable does not exist... failed to create the sampler; sampling not done"

Hey everyone––I’m having trouble fitting a Stan model using RStan. I feel like there’s an issue with how I’ve formatted my data. My data was originally in a data.frame, as it was generated using a mechanistic model. I re-formatted my data as a list, but I’m new to this, so there’s a fair chance I messed up. Here’s the story so far (P.S. I’m sorry if this post isn’t formatted properly!):

library(rstan)
library(gdata)
library(bayesplot)

write("// Stan Model; code the actual model in Stan
functions{
  real[] dZ_dt(real t, // Time
               real[] Z, // System state {Parasite, Host}
               real[] alpha, // Parameters
               real[] x_r, // Unused data (?)
               int[] x_i){
    real P = Z[1]; // System state coded as an array, such that Z = (P,H)
    real H = Z[2];
  
    real r = alpha[1]; // Parameters of the system, in order they appear
    real O = alpha[2];
    real h = alpha[3];
    real b = alpha[4];
    real c = alpha[5];
    real u = alpha[6];
  
    real dP_dt = P*r - H*(O*P/1 + O*P*h); // Deterministic mechanistic model
    real dH_dt = b + H*(c*(O*P/1 + O*P*h)-u);
    return{dP_dt,dH_dt}; // Return the system state
  }
}

data{
  int<lower=0>N; // Define N as non-negative integer
  real ts[N]; // Assigns time points to N (I think?)
  real y_init[2]; 
  real<lower=0>y[N,2]; // Define y as real and non-negative
}

parameters{
  real<lower=0>alpha[6]; // Make all items in alpha non-neg
  real<lower=0>Z_init[2]; // Initial population size non-neg
  real<lower=0>sigma[2]; // Error term non-negative
}

transformed parameters{
  real Z[N,2]
  = integrate_ode_rk45(dZ_dt,Z_init,0,ts,alpha,rep_array(0.0,0),rep_array(0,0),1e-6,1e-5,2000);
}

model{
  alpha[{1}]~uniform(0,10);
  alpha[{2}]~uniform(0,1);
  alpha[{3}]~uniform(0,60);
  alpha[{4}]~uniform(0,100);
  alpha[{5}]~uniform(0,1);
  alpha[{6}]~uniform(0,1);
  sigma~lognormal(-1,1);
  Z_init~lognormal(log(10),1);
  for (k in 1:2){
    y_init[k]~lognormal(log(Z_init[k]),sigma[k]);
    y[ ,k]~lognormal(log(Z[ ,k]),sigma[k]);
  }
}",
"Stan_Model_TypeII.stan")

stanc("Stan_Model_TypeII.stan") # To check that we wrote a file (I think we did?)

Stan_Model_TypeII <- stan_model("Stan_Model_TypeII.stan")

# Squeezing the data into a form that Stan gets
N <- length(Stoch_Data_TypeII$t)-1 # N is 1952 which makes sense bc length of DF is 1953
ts <- 1:N
y_init <- c(Stoch_Data_TypeII$P[1], Stoch_Data_TypeII$H[1]) # Initial states, P = 1; H = 18
y <- as.matrix(Stoch_Data_TypeII[2:(N+1),2:3])
y <- cbind(y[,2],y[,1]); # This worked, sick; where y[,1] is H, and y[,2] is P
Stan_StochData_TypeII <- list(N,ts,y_init,y)

# Fitting the data to the model
fit <- stan(file = "Stan_Model_TypeII.stan", 
            data = Stan_StochData_TypeII, 
            warmup = 500, iter = 1000, chains = 2, cores = 1, thin = 1, 
            algorithm = "HMC", 
            diagnostic_file = "TypeII_Fitting_Output.R", 
            seed = 1996, verbose = TRUE)

And here’s the progress made on the model:

TRANSLATING MODEL 'Stan_Model_TypeII' FROM Stan CODE TO C++ CODE NOW.
successful in parsing the Stan model 'Stan_Model_TypeII'.

CHECKING DATA AND PREPROCESSING FOR MODEL 'Stan_Model_TypeII' NOW.

COMPILING MODEL 'Stan_Model_TypeII' NOW.

STARTING SAMPLER FOR MODEL 'Stan_Model_TypeII' NOW.

And here’s what the error code reads:

Error in new_CppObject_xp(fields$.module, fields$.pointer, ...) : 
  Exception: variable does not exist; processing stage=data initialization; variable name=N; base type=int  (in 'model1b35f2189f6_Stan_Model_TypeII' at line 25)

failed to create the sampler; sampling not done

Take a look at this section:

data{
  int<lower=0>N; // Define N as non-negative integer
  real ts[N]; // Assigns time points to N (I think?)
  real y_init[2]; 
  real<lower=0>y[N,2]; // Define y as real and non-negative
}

The data section tells you what Stan is expecting as named elements in the list. So, you probably have a data frame that I’ll call dat with three columns, ts and two other columns I’ll call y1 and y2. I’m also guessing that y_init refers to the first entry in y1 and y2. In that case, you’d want to construct a list as:

data_for_stan = list(
    N = nrow(dat) - 1 #minus 1 bc we use the first row for y_init
    , ts = dat$ts[2:nrow(dat)] 
    , y_init = c(dat$y1[1], dat$y2[1])
    , y  = as.matrix( cbind(dat$y1,dat$y2))[2:nrow(dat)]
)

The elements of the data list should be named, such as in list(N=N, ts=ts, etc...).

This worked! Thank you so much!

1 Like