Indexing an arbitrary subset

Dear Stan Forum,

I’d like to be able to index an arbitrary, nonconsecutive subset of the elements of a vector. For example, I have a parameter vector alpha of length 6. The Stan manual shows how to index a consecutive subset, e.g.
alpha[1:3]
but I’d like to be able to perform one operation on, say, elements 1, 3, and 6 and a different operation on 2, 4, and 5. What seems like it should work is putting the subset in braces, as below:

parameters {
  vector[6] alpha;
}
model{
  alpha[{1,3,6}] ~ normal(0,1);
}

However, I get the following error:

Error in mde(x) : ‘list’ object cannot be coerced to type ‘integer’

I understand that the indices have to have integer type. I have tried several ways to make this happen, but the following does not work:

transformed data{
  int indices1[6] = {1,3,6};
}
parameters {
  vector[6] alpha;
}
model{
  alpha[indices1] ~ normal(0,1);
}

This gives the same error about integer typing, and so does the following, where I tried to read the indices in as data:

data{
  int indices1[6];
}
parameters {
  vector[6] alpha;
}
model{
  alpha[indices1] ~ normal(0,1); 
}

Is there even a way to declare a vector of integers in Stan?

Learning how to do this would greatly simplify my code, so help is appreciated!

This doesn’t look like a Stan error:

Error in mde(x) : ‘list’ object cannot be coerced to type ‘integer’

How are you calling/running your models?

I run my Stan models with an R script using cmdstanr. In particular, the run command looks like

model <- cmdstan_model(paste0(getwd(), '/', model_name,'.stan'))
fit <- model$sample(
  data = stan_data,
  seed = as.numeric(Sys.Date()),
  chains = 8,
  iter_warmup = 2000,
  iter_sampling = 10000,
  refresh = 100,
  thin = 10, 
  adapt_delta = .95
  )

How is your stan_data argument constructed?

When I tried creating the index vectors by reading in data, it looked like this. Let me know what other parts of the code you would need to see.

stan_data <- list( 
  N = N,
  I = I,
  id = id,
  group = group,
  w = w,
  x = x,
  ECT = ECT,
  MMSE = MMSE,
  indices1 = as.integer(c(1,3,4,5,7,8)),
  indices2 = as.integer(c(2,6,9,10,11,12))
)

This looks like an error where you’re attempting to provide multiple values for a data variable of type int (rather than array[] int). Can you post the data block (or the full model) of the Stan model that you used this dataset with?

Sure, I was trying to keep it simple, but here is the whole thing. If I figure out this multiple indexing thing, I can also make other improvements to the code.

Part of R run script:

stan_data <- list( 
  N = N,
  I = I,
  id = id,
  group = group,
  w = w,
  x = x,
  ECT = ECT,
  MMSE = MMSE,
  indices1 = as.integer(c(1,3,4,5,7,8)),
  indices2 = as.integer(c(2,6,9,10,11,12))
)
model <- cmdstan_model(paste0(getwd(), '/', model_name,'.stan'))
fit <- model$sample(
  data = stan_data,
  seed = as.numeric(Sys.Date()),
  chains = 8,
  iter_warmup = 2000,
  iter_sampling = 10000,
  refresh = 100,
  thin = 10, 
  adapt_delta = .95
  )

Stan model:

data {
 
int<lower=1> I; // number of subjects
int<lower=1> N; // total observations; each n is ij
int<lower=1> id[N]; // id of nth observation
int<lower=1> group[I]; // diagnostic group of subject id i
vector[2] ECT[N]; // 
int<lower=0, upper=30> MMSE[N]; 
matrix[N, 2] x; // ones column, t column (t is years since initial visit)
matrix[N, 4] w; // age_c, male, age_c_t, male_t
int indices1[6];
int indices2[6];
}

transformed data {}

parameters {

vector[12] alpha[3]; // fixed effects
real<lower=0> phi[3]; // scale parameter
vector[6] beta[I]; // random effects; (i, m) = (subject, index)
vector[6] mu[3]; // means of random effects; (l, m) = (group, index)
vector<lower=0>[6] tau[3]; // sd's of random effects; (l, m) = (group, index)
cholesky_factor_corr[6] L_Omega[3]; // random effects correlation matrix (l) = group
vector<lower=0>[2] sigma[3]; // error sd's for ECT
cholesky_factor_corr[2] L_Rho[3]; // error correlations for ECT (l) = group

}

transformed parameters {}

model {
 
// LIKELIHOOD
for(n in 1:N){
  ECT[n] ~ multi_normal(
    [ row(w, n) * alpha[group[id[n]]][1:4] + row(x, n) * beta[id[n]][1:2], 
      row(w, n) * alpha[group[id[n]]][5:8] + row(x, n) * beta[id[n]][3:4] ],
  diag_pre_multiply(sigma[group[id[n]]], L_Rho[group[id[n]]]) * 
  diag_pre_multiply(sigma[group[id[n]]], L_Rho[group[id[n]]])'
  );
  MMSE[n] ~ neg_binomial_2(
  exp(row(w, n) * alpha[group[id[n]]] + row(x, n) * beta[id[n]]),
  phi[group[id[n]]]
  );
}

for(i in 1:I){
  beta[i] ~ multi_normal(
    mu[group[i]],
    diag_pre_multiply(tau[group[i]], L_Omega[group[i]]) *
    diag_pre_multiply(tau[group[i]], L_Omega[group[i]])'
  );
}

// PRIORS
for(l in 1:3){
  L_Rho[l] ~ lkj_corr_cholesky(1);
  L_Omega[l] ~ lkj_corr_cholesky(.8);
  alpha[l][indices1] ~ normal(0,.25);
  alpha[l][indices2] ~ normal(1, 2);
}

// FSLong intercepts
mu[1][1] ~ normal(6, 3);
mu[2][1] ~ normal(6, 3);
mu[3][1] ~ normal(6, 3);
tau[1][1] ~ exponential(2); 
tau[2][1] ~ exponential(2); 
tau[3][1] ~ exponential(2); 

// FSLong slopes
mu[1][2] ~ normal(0, .5);
mu[2][2] ~ normal(0, .5);
mu[3][2] ~ normal(0, .5);
tau[1][2] ~ exponential(4); 
tau[2][2] ~ exponential(4); 
tau[3][2] ~ exponential(4); 

// ANTs intercepts
mu[1][3] ~ normal(7, 3);
mu[2][3] ~ normal(7, 3);
mu[3][3] ~ normal(7, 3);
tau[1][3] ~ exponential(1); 
tau[2][3] ~ exponential(1); 
tau[3][3] ~ exponential(1); 

// ANTs slopes
mu[1][4] ~ normal(0, .5);
mu[2][4] ~ normal(0, .5);
mu[3][4] ~ normal(0, .5);
tau[1][4] ~ exponential(10);
tau[2][4] ~ exponential(10);
tau[3][4] ~ exponential(10);

// MMSE intercepts
mu[1][5] ~ normal(0, 1);
mu[2][5] ~ normal(1, 1);
mu[3][5] ~ normal(1, 1);
tau[1][5] ~ exponential(2);
tau[2][5] ~ exponential(2);
tau[3][5] ~ exponential(2);

// MMSE slopes
mu[1][6] ~ normal(0, 1);
mu[2][6] ~ normal(0, 1);
mu[3][6] ~ normal(0, 1);
tau[1][6] ~ gamma(5, 50);
tau[2][6] ~ exponential(5);
tau[3][6] ~ exponential(5);

}

generated quantities {}

Actually, I think this is just an R syntax error, rather than anything to do with Stan. Instead of:

indices1 = as.integer(c(1,3,4,5,7,8)),
 indices2 = as.integer(c(2,6,9,10,11,12))

Can you try:

indices1 = c(1,3,4,5,7,8),
 indices2 = c(2,6,9,10,11,12)

OK, I think this solves the problem! Thank you!

Can I ask, is there a way to declare these indices directly in the Stan program, as opposed to putting them in the runscript?

A related issue: even though I have Stan 2.31, I can’t declare arrays as described in the reference manual:

For all these commands, I get the error “Variable “array” does not exist.”

transformed data {
// array[6] int indices1 = [1,3,4,5,7,8]
// array[6] int indices1 = {1,3,4,5,7,8}
// array[6] int indices1 = (1,3,4,5,7,8)
}

Here is a minimal example:

stanmod <- "
data {
  real y_mean;
}
transformed data {
  array[2] int indices = {1, 4};
  vector[4] dummy_data = [1, 2, 3, 4]';
  vector[2] subsetted = dummy_data[indices];
}
parameters {
  real y;
}
model {
  y ~ normal(y_mean, 1);
}
"

library(cmdstanr)
mod <- cmdstan_model(write_stan_file(stanmod))
fit <- mod$sample(data = list(y_mean = 0))

Excellent, thanks Andrew. I am still getting the red underline and the warning message “variable “array” does not exist”. But I have been reading discussions, and people are saying that array declaration is a new feature, and that the syntax checker in RStudio has not kept up with the developments in cmdstanr. Is that right?

That’s correct