Using int array to extract from vector array

To get myself used to stan, so that I can code up a complex model I’m trying to fit in the future, I’m trying to get a basic linear mixed effects model running in stan. However I’ve run into trouble when I try to extract the random effects that relate to each individual. I’ve included the code so far below, but this will not run yet.

What I am trying to do is create an array of length N of vectors of length 2 in b2. I then want to extract, using an integer id number held in integer array idnum, the relevant part of b2. The array idnum will be of length M and will take unique values 1...N.

When I try to do this using code:
linpred_y = X_l*beta1 + Z2_l*b2[idnum,];
I get the following error message (with extra lines deleted):

No matches for: 
  matrix * vector[ ]

...

No matches for: 
  vector + ill-formed

I’d really appreciate any advice how to extract vectors from a vector array, using id values held in an integer array, such that I can multiply a design matrix, and the extracted vector, to make up part of a linear predictor. Code below:

functions{
}
data {
  //dimensions of data
  int<lower=1> M;      //number of longitudinal observations
  int<lower=1> N;      //number of individuals

  //id variables
  int<lower=1> idnum[M]; //long id variable

  //values for priors
  real<lower=0> sigmaBeta;  
  real<lower=0> AU;         
  real<lower=0> Aeps1;    

  vector[M] y1;        //longitudinal outcome
  matrix[M,3] X_l;     //design matrix long fixed
  matrix[M,2] Z2_l;     //design matrix long indrand
}
transformed data{
  vector[2] zeroVec;
  zeroVec = rep_vector(0,2);  //used for mean vector for random effects
}
parameters{
  vector[3] beta1;             //fixed effects for longitudinal component
  vector[2] b2[N];         //individual specific random effects 
 
 real<lower=0> sigmaEps1;
  corr_matrix[2] Omega_Dl;     
  vector<lower=0>[2] tau_Dl;   
}
transformed parameters{
  vector[M] linpred_y;
  matrix[2,2] Dl;
  linpred_y = X_l*beta1 + Z2_l*b2[idnum,];
  Dl = quad_form_diag(Omega_Dl, tau_Dl);
}
model{
  //Priors
  tau_Dl ~ student_t(3,0,AU);          
  Omega_Dl~ lkj_corr(1);              
  beta1  ~ normal(0,sigmaBeta);   
  sigmaEps1 ~ student_t(3,0,Aeps1);    

  //Likelihood
  y1 ~ normal(linpred_y,sigmaEps1);  
  b2 ~ multi_normal(zeroVec,Dl);  
}
generated quantities{
}

Thanks in advance

Sorry, I am now on mobile, so cannot test, but wouldn’t having b2 as matrix[2,N] work as expected?

Hi, thanks for the response - I have tried using a matrix but get error messages linked to memory size - the array seems to manage the sizes better. This is based on a colleague who has got a similar model to run using array rather than matrix. (As an example that I was running, M=142628, N=5000)

I should say as well that I have a version running using a for loop to select b2, but was hoping to avoid a for loop due to speed

That’s bcause b2 is an array of vectors, idnum is an array of integers, so that b2[idnum, ] is also an array of vectors. What do you want the result of that operation to be?

Matrices are more memory efficient than arrays, so something else is going on.

I am with Bob, unless I am missing something. memory shouldn’t really be an issue here, could you post the error (and the Stan code using a matrix) - maybe it actually indicates a completely different problem.

It is also probably worth pointing out, that if the for loop is over the two elements, the speed difference is likely negligible.

Finally, unless I am mistaken, the model you are building seems to be well within the capabilities of brms, which might save you quite a lot of time. Obviously if learning how to build Stan programs and/or extending the model beyond the capabilities of brms is a goal, it is definitely worth taking your time with the model.

Hope that helps!

Hi, I am trying to learn how to build stan programs in order to be able to program up a much more complex model (which can’t be fitted in brms) so once this model is running, I will be adding other components to it one by one.

I have updated the code (below), which now runs

functions{
}
data {
  //dimensions of data
  int<lower=1> M;      //number of observations
  int<lower=1> N;      //number of groups (individuals)

  //id variables
  int<lower=1> idnum[M]; //long id variable

  //values for priors
  real<lower=0> sigmaBeta;  
  real<lower=0> AU;         //involved in prior for tau_Dl - variances for random effects
  real<lower=0> Aeps1;      //involved in prior for error for longitudinal sub-model

  //longitudinal outcome information
  vector[M] y1;        //longitudinal outcome
  matrix[M,3] X_l;     //design matrix long fixed
  matrix[M,2] Z2_l;     //design matrix long indrand
  vector[2] b2_ones; //vector of ones for use ind rand
}
transformed data{
  vector[2] zeroVec;
  zeroVec = rep_vector(0,2);  //used for mean vector for random effects
}
parameters{
  vector[3] beta1;             //fixed effects for longitudinal component
  matrix[N,2] b2;   //longitudinal ind rand effects
  //vector[2] b2[N];         //individual specific random effects
  real<lower=0> sigmaEps1;
  corr_matrix[2] Omega_Dl;     //Correlation Matrix
  vector<lower=0>[2] tau_Dl;   //variance terms
}
transformed parameters{
  vector[M] linpred_y;
  vector[M] Z2b2_l;
  matrix[2,2] Dl;
  linpred_y = X_l*beta1 + (Z2_l .* to_matrix(b2[idnum,]))*b2_ones;
  Dl = quad_form_diag(Omega_Dl, tau_Dl);
}
model{
  //Priors
  tau_Dl ~ student_t(3,0,AU);          //prior for variances for random effects
  Omega_Dl~ lkj_corr(1);               //prior for correlation matrix for random effects
  beta1  ~ normal(0,sigmaBeta);     //prior for fixed effects
  sigmaEps1 ~ student_t(3,0,Aeps1);     //prior for longitudinal error term

  //Likelihood
  y1 ~ normal(linpred_y,sigmaEps1);  //update longitudinal outcome
  for(i in 1:N){
    b2[i,] ~ multi_normal(zeroVec, Dl);//update random effects
  }
}

Using to_matrix appeared to solve a few problems.

However when I run the model (admittedly only for 1000 warmup + 1000 iterations, just to test the function completes), it doesn’t output any results, just the following warning message:

In validityMethod(object) :
  The following variables have undefined values:  Z2b2_l[1],
The following variables have undefined values:  Z2b2_l[2],
The following variables have undefined values:  Z2b2_l[3],
The following variables have undefined values:  Z2b2_l[4],[... truncated]

There were also error messages to do with number of iterations and chains, but of course for a real analysis I would use an appropriate (higher) number of iterations and 3 or 4 chains.

The only examples of this that I can see link to overflow, which I’m not sure how to fix in this case?

This means that Z2b2_l was not assigned any value - note that it is defined in transformed parameters and never used again.

Thankyou - removing that has removed the warning