Error translating RStan Case Study to cmdStan

I have successfully translated this RStan case study into cmdStan: https://mc-stan.org/users/documentation/case-studies/golf.html

I am now trying to do the same with this one: https://mc-stan.org/users/documentation/case-studies/model-based_causal_inference_for_RCT.html

Unfortunately I get this error when I run the executable:

Exception: variable does not exist; processing stage=data initialization; variable name=y; base type=double (in 'examples/Model-based-Inference-for-Causal-Effects-in-Completely-Randomized-Experiments/experiments.stan', line 3, column 2 to column 14)

What I causing this ?

Here is my code:
experiments.data.R (1.3 KB)

data {
  int<lower=0> N;                   // sample size
  vector[N] y;                      // observed outcome
  vector[N] w;                      // treatment assigned
  real<lower=-1,upper=1> rho;        // assumed correlation between the potential outcomes
}
parameters {
  real alpha;                       // intercept
  real tau;                         // super-population average treatment effect
  real<lower=0> sigma_c;            // residual SD for the control
  real<lower=0> sigma_t;            // residual SD for the treated
}
model {
   // PRIORS
   alpha ~ normal(0, 5);            
   tau ~ normal(0, 5);
   sigma_c ~ normal(0, 5);          
   sigma_t ~ normal(0, 5);

   // LIKELIHOOD
   y ~ normal(alpha + tau*w, sigma_t*w + sigma_c*(1 - w));
}
generated quantities{
  real tau_fs;                      // finite-sample average treatment effect  
  real y0[N];                       // potential outcome if W = 0
  real y1[N];                       // potential outcome if W = 1
  real tau_unit[N];                 // unit-level treatment effect
  for(n in 1:N){
    real mu_c = alpha;            
    real mu_t = alpha + tau;      
    if(w[n] == 1){                
      y0[n] = normal_rng(mu_c + rho*(sigma_c/sigma_t)*(y[n] - mu_t), sigma_c*sqrt(1 - rho^2)); 
      y1[n] = y[n];
    }else{                        
      y0[n] = y[n];       
      y1[n] = normal_rng(mu_t + rho*(sigma_t/sigma_c)*(y[n] - mu_c), sigma_t*sqrt(1 - rho^2)); 
    }
    tau_unit[n] = y1[n] - y0[n];
  }
  tau_fs = mean(tau_unit);        
}

How do you generate the data file that you pass to your CmdStan executable?

Offhand, I’d recommend that, given the contents of your experiments.data.R file, that you generate the data file (which here I’ve named “stan_data.json”) as follows:

library(jsonlite)
write_json(stan_data, "stan_data.json", pretty = TRUE, auto_unbox = TRUE)

FYI, the pretty option adds some indentation to the JSON file to make it a bit easier to read, and the auto_unbox allows your scalars, N and rho, to be written as scalars instead of arrays of length 1, i.e. you’ll see “"N": 500” and “"rho": 0” in your data file instead of “"N": [500]” and “"rho": [0]”.

I’ve been running:
make examples/Model-based-Inference-for-Causal-Effects-in-Completely-Randomized-Experiments/experiments

to comile and:
./experiments sample data file=experiments.data.R to run. I copied this from the Bernoulli example that comes inside the cmdStan GitHub repo.

Sorry I’m quite new to Stan and don’t understand where I would need to run the commands

library(jsonlite)
write_json(stan_data, "stan_data.json", pretty = TRUE, auto_unbox = TRUE)

to generate the data file

I belief the experiments.data.R file you uploaded is not the one you need with the actual data in. The file looks like the R code to generate the data. If you want to use cmdStan you need to write the generated data to an appropriate file. I typically use the rstan::stan_rdump function but from @jjramsey’s answer I gather that you can also use the json format with the R code he provided.