Error during sampling using CmdStanR

I have been trying to estimate a model with significantly large number of parameters and hence am using cmdstanr to save the draws straight to csv file and not have memory issues. However, I am getting a strange error I have not enocuntered yet and am not sure how to resolve this:

I have checked the code but can’t seem to find any issues. Below is the model code:

data {
  int<lower=0> N;//Number of observations
  int<lower=1> J;//Number of predictors with random slope
  int<lower=1> K;//Number of predictors with non-random slope
  int<lower=1> L;//Number of customers/groups
  int<lower=0,upper=1> y[N];//Binary response variable
  int<lower=1,upper=L> ll[N];//Number of observations in groups
  matrix[N,K] x1;
  matrix[N,J] x2;
}
transformed data {
  matrix[N, K] Q_ast1;
  matrix[K, K] R_ast1;
  matrix[K, K] R_ast_inverse1;
  matrix[N, J] Q_ast2;
  matrix[J, J] R_ast2;
  matrix[J, J] R_ast_inverse2;
  
  // thin and scale the QR decomposition for x1
  Q_ast1 = qr_thin_Q(x1) * sqrt(N - 1);
  R_ast1 = qr_thin_R(x1) / sqrt(N - 1);
  R_ast_inverse1 = inverse(R_ast1);
  
  // thin and scale the QR decomposition for x2
  Q_ast2 = qr_thin_Q(x2) * sqrt(N - 1);
  R_ast2 = qr_thin_R(x2) / sqrt(N - 1);
  R_ast_inverse2 = inverse(R_ast2);
}
parameters {
  vector[J] rbeta_mu; //mean of distribution of beta parameters
  vector<lower=0>[J] rbeta_sigma; //variance of distribution of beta parameters
  vector[J] zeta_raw[L]; //group-specific parameters beta
  vector[K] zeta;
}
transformed parameters {
  vector[J] rzeta[L];
  for (l in 1:L)
    rzeta[l] = rbeta_mu + rbeta_sigma .* zeta_raw[l]; // coefficients on x
}
model {
  rbeta_mu ~ normal(0,5);
  rbeta_sigma ~ cauchy(0,10);
  zeta~normal(0,5);
  for (l in 1:L)
    zeta_raw[l] ~ normal(0,1);

  for(n in 1:N)
    y[n]~bernoulli_logit(Q_ast1[n] * zeta + Q_ast2[n] * rzeta[ll[n]]);
}
generated quantities {
  vector[K] beta;
  vector[K] rbeta[L];
  vector[N] log_lik;
  for(l in 1:L){
    rbeta[l] = R_ast_inverse2 * rzeta[l]; // coefficients on x1
  }
  beta = R_ast_inverse1 * zeta; // coefficients on x
  for (n in 1:N) {
    log_lik[n] = bernoulli_logit_lpmf(y[n] | x1[n] * beta + x2[n] * rbeta[ll[n]]);
  }
}

I greatly appreciate your help in this matter. Thank you !

Are you able to share some example/dummy data that reproduces this error for you? Then I can debug locally

Here is the sample data and the code I used to make the model data list and sample:

data <- read.csv("sample.csv",stringsAsFactors = FALSE)
matdata1 <- data %>% select(-x1,-x2,-group)
matdata2 <- data %>% select(x1,x2)
model_data <- list(N=nrow(matdata1),K=ncol(matdata1),J=ncol(matdata2),x1=matdata1,x2=matdata2,y=data$x2,L=length(unique(data$group)),ll=data$group)

Sampling:

mod <- cmdstan_model("code.stan")
fit <- mod$sample(
  data = model_data, 
  seed = 123, 
  chains = 1, 
  parallel_chains = 1,
  refresh = 600, # print update every 500 iters,
  iter_warmup=1000,
  iter_sampling=1000,
)

I appreciate your help
sample.csv (4.2 MB)

Thanks! I can reproduce locally so will start debugging

The error is caused by these sections:

generated quantities {
  vector[K] rbeta[L];
  ...
  for(l in 1:L){
    rbeta[l] = R_ast_inverse2 * rzeta[l]; // coefficients on x1
  }
}

The result of R_ast_inverse2 * rzeta[l] is a vector[J], but you’re assigning to a vector[K]

EDIT: I’ve also opened an issue in the Stan github so that the error-checking will hopefully catch this in the future

Thank you so much. How clumsy of me!