Error in mod$fit_ptr() : Exception: variable does not exist; processing stage=data initialization; variable name=

Hello, I’m trying to implement the following Stan code:


functions {
// the Hill function
real Hill(real t, real ec, real slope) {
return 1 / (1 + (t / ec)^(-slope));
}
// the adstock transformation with a vector of weights
real Adstock(row_vector t, row_vector weights) {
return dot_product(t, weights) / sum(weights);
}
}
data {
// the total number of observations
int<lower=1> N;
// the vector of sales
real<lower=0> Y[N];
// the maximum duration of lag effect, in weeks
int<lower=1> max_lag;
// the number of media channels
int<lower=1> num_media;
// a vector of 0 to max_lag - 1
row_vector[max_lag] lag_vec;
// 3D array of media variables
row_vector[max_lag] X_media[N, num_media];
// the number of other control variables
int<lower=1> num_ctrl;
// a matrix of control variables
row_vector[num_ctrl] X_ctrl[N];
}
parameters {
// residual variance
real<lower=0> noise_var;
// the intercept
real tau;
// the coefficients for media variables
vector<lower=0>[num_media] beta_medias;
// coefficients for other control variables
vector[num_ctrl] gamma_ctrl;
// the retention rate and delay parameter for the adstock transformation of
// each media
vector<lower=0,upper=1>[num_media] retain_rate;
vector<lower=0,upper=max_lag-1>[num_media] delay;
// ec50 and slope for Hill function of each media
vector<lower=0,upper=1>[num_media] ec;
vector<lower=0>[num_media] slope;
}
transformed parameters {
// a vector of the mean response
real mu[N];
// the cumulative media effect after adstock
real cum_effect;
// the cumulative media effect after adstock, and then Hill transformation
row_vector[num_media] cum_effects_hill[N];
row_vector[max_lag] lag_weights;
for (nn in 1:N) {
for (media in 1 : num_media) {
for (lag in 1 : max_lag) {
lag_weights[lag] =pow(retain_rate[media], (lag - 1 - delay[media]) ^ 2);
}
cum_effect =Adstock(X_media[nn, media], lag_weights);
cum_effects_hill[nn, media] =Hill(cum_effect, ec[media], slope[media]);
}
mu[nn] =tau +
dot_product(cum_effects_hill[nn], beta_medias) +
dot_product(X_ctrl[nn], gamma_ctrl);
}
}
model {
retain_rate ~ beta(3,3);
delay ~ uniform(0, max_lag - 1);
slope ~ gamma(3, 1);
ec ~ beta(2,2);
tau ~ normal(0, 5);
for (media_index in 1 : num_media) {
beta_medias[media_index] ~ normal(0, 1);
}
for (ctrl_index in 1 : num_ctrl) {
gamma_ctrl[ctrl_index] ~ normal(0,1);
}
noise_var ~ inv_gamma(0.05, 0.05 * 0.01);
Y ~ normal(mu, sqrt(noise_var));
}

"

To prepare the model fit i defined the following:

            covariates=covariates = df[1:152,1:20]
            X = cbind(Intercept=1, covariates)
            Y = df[1:152,"y"]
            N = nrow(X)  
            K = ncol(X)   
            X_media= X[,2:12] 
            X_ctrl=X[,13:21]  
            num_media=ncol(X_media)
            num_ctrl=ncol(X_ctrl)
            max_lag = as.integer(13)
            lag_vec=c(0,1,2,3,4,5,6,7,8,9,10,11,12)

dat = list(N=N, K=K, Y=Y, X=X)
fit = stan(model_code = stanmodelcode,
           data = dat,
           iter = 5000,
           warmup = 2500,
           thin = 10,
           chains = 4)

And I’m getting the error message:

Error in mod$fit_ptr() : 
      Exception: variable does not exist; processing stage=data initialization; variable name=max_lag; base type=int  (in 'model4e1943f2d660b_Stan_Model_TypeI' at line 18)

In addition: Warning message:
In system(paste(CXX, ARGS), ignore.stdout = TRUE, ignore.stderr = TRUE) :
  'C:/rtools40/usr/mingw_/bin/g++' not found
failed to create the sampler; sampling not done
1 Like

Hi Namurava,

The error is saying that variable max_lag was not found in the data passed to Stan. This is because you only pass three variables as data:

dat = list(N=N, K=K, Y=Y, X=X)

Where you need to pass all of the variables that were defined in the data block of your Stan code

2 Likes

Thank you, @andrjohns.
It is work!

2 Likes

@andrjohns I am running into the same problem now that my code is working. It does not recognise a my temperature variable. This is a hierarchal model I posted in the power function error post. I initially got ER20_ mu and ER20_ sigma as missing variable even though it was declared in the parameters of priors and in the model block ER20 ~ normal(ER20_ mu, ER20_ sigma);. Then I removed it and replaced the ER20 to be ER20 ~ normal(-2, 0.5); then the missing variable error stopped with ER20. Now, I am facing the same error with the temperature variable temp_water In my data. Any idea how to go about this? The error and the model are below.

Error in mod$fit_ptr() :
Exception: variable does not exist; processing stage=data initialization; variable name=temp_water; base type=vector_d (in ‘model4b1c66371d71_b_np_oipi_tr_psrqkm’ at line 30)

 // b_np_oipi_tr_psrqkm.stan
data {
  // Parameters of priors on metabolism
  real alpha_meanlog;
  real<lower=0> alpha_sdlog;
  real<lower=0> Pmax_mu;
  real<lower=0> Pmax_sigma;
  real K600_daily_meanlog;
  real<lower=0> K600_daily_sdlog;
  
  // Error distributions
  real<lower=0> err_obs_iid_sigma_scale;
  real<lower=0> err_proc_iid_sigma_scale;
  
  // Data dimensions
  int<lower=1> d; // number of dates
  real<lower=0> timestep; // length of each timestep in days
  int<lower=1> n24; // number of observations in first 24 hours per date
  int<lower=1> n; // number of observations per date
  
  // Daily data
  vector[d] DO_obs_1;
  
  // Data
  vector[d] DO_obs[n];
  vector[d] DO_sat[n];
  vector[d] light[n];
  vector[d] temp_water[n];
  vector[d] depth[n];
  vector[d] KO2_conv[n];
}

parameters {
  vector[d] alpha_scaled;
  vector[d] Pmax;
  vector[d] ER20;
  vector<lower=0>[d] K600_daily;
  real<lower=0> err_obs_iid_sigma_scaled;
  real<lower=0> err_proc_iid_sigma_scaled;
  vector[d] DO_mod[n];
}

transformed parameters {
  real<lower=0> err_obs_iid_sigma;
  vector[d] DO_mod_partial_sigma[n];
  real<lower=0> err_proc_iid_sigma;
  vector<lower=0>[d] alpha;
  vector[d] GPP_inst[n];
  vector[d] ER_inst[n];
  vector[d] KO2_inst[n];
  vector[d] DO_mod_partial[n];
  
  // Rescale error distribution parameters
  err_obs_iid_sigma = err_obs_iid_sigma_scale * err_obs_iid_sigma_scaled;
  err_proc_iid_sigma = err_proc_iid_sigma_scale * err_proc_iid_sigma_scaled;
  
  // Rescale select daily parameters
  alpha = exp(alpha_meanlog + alpha_sdlog * alpha_scaled);
  
  // Model DO time series
  // * trapezoid version
  // * observation error
  // * IID process error
  // * reaeration depends on DO_mod ERT is ER20
  
  // Calculate individual process rates
  for(i in 1:n) {
    GPP_inst[i] = Pmax .* tanh(light[i] .* alpha ./ Pmax);
	ER_inst[i] = ER20 .* exp(log(1.045) * (temp_water[i] - 20)); 
	KO2_inst[i] = K600_daily .* KO2_conv[i];
  }
  
  // DO model
  DO_mod_partial[1] = DO_obs_1;
  DO_mod_partial_sigma[1] = err_proc_iid_sigma * timestep ./ depth[1];
  for(i in 1:(n-1)) {
    DO_mod_partial[i+1] =
      DO_mod[i] .*
        (2.0 - KO2_inst[i] * timestep) ./ (2.0 + KO2_inst[i+1] * timestep) + (
        (GPP_inst[i] + ER_inst[i]) ./ depth[i] +
        (GPP_inst[i+1] + ER_inst[i+1]) ./ depth[i+1] +
        KO2_inst[i] .* DO_sat[i] + KO2_inst[i+1] .* DO_sat[i+1]
      ) .* (timestep ./ (2.0 + KO2_inst[i+1] * timestep));
    for(j in 1:d) {
      DO_mod_partial_sigma[i+1,j] = err_proc_iid_sigma * 
        sqrt(pow(depth[i,j], -2) + pow(depth[i+1,j], -2)) .*
        (timestep / (2.0 + KO2_inst[i+1,j] * timestep));
    }
  }
}

model {
  // Independent, identically distributed process error
  for(i in 1:n) {
    DO_mod[i] ~ normal(DO_mod_partial[i], DO_mod_partial_sigma[i]);
  }
  // SD (sigma) of the IID process errors
  err_proc_iid_sigma_scaled ~ cauchy(0, 1);
  
  // Independent, identically distributed observation error
  for(i in 1:n) {
    DO_obs[i] ~ normal(DO_mod[i], err_obs_iid_sigma);
  }
      // SD (sigma) of the observation errors
  err_obs_iid_sigma_scaled ~ cauchy(0, 1);
  
     // Daily metabolism priors
  alpha_scaled ~ normal(0, 1);
  Pmax ~ normal(Pmax_mu, Pmax_sigma);
  ER20 ~ normal(-2, 0.5);
  K600_daily ~ lognormal(K600_daily_meanlog, K600_daily_sdlog);
}
generated quantities {
  vector[d] err_obs_iid[n];
  vector[d] err_proc_iid[n-1];
  vector[d] GPP;
  vector[d] ER;
  vector[n] DO_obs_vec; // temporary
  vector[n] DO_mod_vec; // temporary
  vector[d] DO_R2;
  
  for(i in 1:n) {
    err_obs_iid[i] = DO_mod[i] - DO_obs[i];
  }
  for(i in 2:n) {
    err_proc_iid[i-1] = (DO_mod_partial[i] - DO_mod[i]) .* (err_proc_iid_sigma ./ DO_mod_partial_sigma[i]);
  }
  for(j in 1:d) {
    GPP[j] = sum(GPP_inst[1:n24,j]) / n24;
    ER[j] = sum(ER_inst[1:n24,j]) / n24;
    
    // Compute R2 for DO observations relative to the modeled, process-error-corrected state (DO_mod)
    for(i in 1:n) {
      DO_mod_vec[i] = DO_mod[i,j];
      DO_obs_vec[i] = DO_obs[i,j];
    }
    DO_R2[j] = 1 - sum((DO_mod_vec - DO_obs_vec) .* (DO_mod_vec - DO_obs_vec)) / sum((DO_obs_vec - mean(DO_obs_vec)) .* (DO_obs_vec - mean(DO_obs_vec)));
  }
  
}

Can you post the R code that you’re using to call the Stan model?

Thank you @andrjohns for the prompt reply. Yes… Here it is.

mm_saturator_ER<-
mm_name(‘bayes’, GPP_fun=‘satlight’, ER_fun=‘q10temp’,pool_K600=‘none’) %>%
specs(alpha_meanlog=.4,alpha_sdlog=.2,Pmax_mu=6,Pmax_sigma=.3, day_start = 4,day_end = 28,n_cores=4, n_chains=4, burnin_steps=500, saved_steps=500) %>%
metab(dat)

Oh is this from an R package that creates the Stan model and data and runs it for you? This isn’t your model and data that you’re running through RStan?

Yes, its an R package named streamMetabolizer. You can find it in

The package allows editing the current models and making new models. There is no default Bayes model with ER function in it so I added that section to my model with the mechanistic model for the estimation of ER similar to GPP. Do you think its the package limiting the editing of the model?

Well the error that you’re getting above:

Error in mod$fit_ptr() :
Exception: variable does not exist; processing stage=data initialization; variable name=temp_water; base type=vector_d (in ‘model4b1c66371d71_b_np_oipi_tr_psrqkm’ at line 30)

Is saying that there is no variable named temp_water in the dataset that is being passed to Stan. So if this variable is 100% in your dataset, then the issue is how the R package is preparing and then passing the data to Stan.

The variable is in my input to Rstan here is the block that prepares my data. It is strange because I did not get the same error for other data e.g. light. May you suggest a different way to prepare the data in that case? Or direct me to a resource to be able to do that?

dat %>% unitted::v() %>%
select(solar.time, depth, temp.water, light) %>%
gather(type, value, depth, temp.water, light) %>%
mutate(type=ordered(type, levels=c(‘depth’,‘temp.water’,‘light’)),
units=ordered(labels[type], unname(labels))) %>%
ggplot(aes(x=solar.time, y=value, color=type)) + geom_line() +
facet_grid(units ~ ., scale=‘free_y’) + theme_bw() +
scale_color_discrete(‘variable’)

In that block your variable is called temp.water, where Stan is expecting temp_water. Try renaming your variable to temp_water

Unfortunately that didn’t work. I got this error
Error: data is missing these columns: temp.water

The data block for this package looks like this:

# columns typical of instantaneous data
mm_data(solar.time, DO.obs, DO.sat, depth, temp.water, light)

Right, so if you’ve renamed the column in your dataset to temp_water, then it also needs to be renamed in the call:

mm_data(solar.time, DO.obs, DO.sat, depth, temp_water, light)

Here is what I did.

dat<-data.frame(solar.time=as.POSIXct(site$solar.time,tz=“UTC”,format=“%d/%m/%Y %H:%M”),DO.obs=ma(site$DO.obs,10),DO.sat=site$DO.sat,depth=site$depth,temp_water=site$temp.water,light=site$light)

dat %>% unitted::v() %>%
select(solar.time, depth, temp_water, light) %>%
gather(type, value, depth, temp_water, light) %>%
mutate(type=ordered(type, levels=c(‘depth’,‘temp.water’,‘light’)),
units=ordered(labels[type], unname(labels))) %>%
ggplot(aes(x=solar.time, y=value, color=type)) + geom_line() +
facet_grid(units ~ ., scale=‘free_y’) + theme_bw() +
scale_color_discrete(‘variable’)

but I got the error I just posted.
Error: data is missing these columns: temp.water
Timing stopped at: 0.36 0 0.36
Sorry for confusing you. I meant to say the default of this package was the block I just mentioned with the temp.water. Not sure if this is overwritten when I changed the labels on my data.

Where is this error coming from:

Error: data is missing these columns: temp.water
Timing stopped at: 0.36 0 0.36

Because the code block you’re posting just looks like its for plotting data, not running a stan model

Yes, my bad, I wasn’t thinking. You are right, ignore the plotting code, the dataframe one is the data block.

dat<-data.frame(solar.time=as.POSIXct(site$solar.time,tz=“UTC”,format="%d/%m/%Y %H:%M"),DO.obs=ma(site$DO.obs,10),DO.sat=site$DO.sat,depth=site$depth, temp_water =site$temp.water,light=site$light)

So you’re running:

dat<-data.frame(
        solar.time=as.POSIXct(site$solar.time,tz=“UTC”,format="%d/%m/%Y %H:%M"),
        DO.obs=ma(site$DO.obs,10),
        DO.sat=site$DO.sat,
        depth=site$depth,
        temp_water =site$temp.water,
        light=site$light)

And then this gives the error:

Error: data is missing these columns: temp.water

Is that right? Does the site dataset not have a temp.water variable?

Yes, you are right for the dat. And no for the second comment, the site data has temp.water values. Here is the first few lines from dat.
solar.time DO.obs DO.sat depth temp_water light
2020-03-17 16:00:00 5.421417 6.730909 2 24.22683 808.6050
2020-03-17 17:00:00 5.395833 6.734946 2 24.19917 523.4264
2020-03-17 18:00:00 5.406083 6.740250 2 24.17967 252.8344

But if those are the first few lines from dat, then there wasn’t an issue creating the dataset? Is the error:

Error: data is missing these columns: temp.water

From a different section of code?

I have changed all mentions of the temp.water in the R code and the Rstan. So there is no other place I can think of to make changes to :(