Indexing Time in a dynamic factor model

I am trying to fit a Bayesian Dynamic Factor Model and I have gone through Jim Savage’s example http://modernstatisticalworkflow.blogspot.com/2016/08/a-brief-introduction-to-dynamic-factor.html for the Dynamic Factor Model and http://modernstatisticalworkflow.blogspot.com/2017/04/hierarchical-vector-autoregression.html for his hierarchical vector autoregressive model.
I am having issues with incorporating a lagged time for the common factors. My model is
Y_{i,t} = \nu + \lambda \eta_{i,t} + \epsilon_{i,t},
where
\eta_{i,t} = \alpha + B\eta_{i,t-1} + \zeta_{it}.
In this case, I have 6 observed variables with 2 factors. That is, a common factor to 3 observed variables only.


data {
int N_Y; //number of observed variables
int N_T; //number of time points
int N_I; //number of individuals 
int N; //number of rows
matrix[N, N_Y] Y1; //observed variables
int ID[N]; //Participant ID variable
int Time[N]; //Time variable  
}
parameters {
vector[6] nu; //intercept
vector[4] lambda; //loadings
matrix[N,2] eta; //factor
//OR
real eta1[N_I, N_T,2]; //alternative factor
}

So this is where I am having my struggles, do I specify my factor loadings as a matrix of 3D array to account for the time lag within a person. I am not building a hierarchical model in this case but I need the time lags to only show within a person and not as

for(t in 1:N_T){
for(j in 1:2){
eta[t,j] = alpha[j] + B[j]*eta[t-1,j]; 
}
}

Because in this case, Person 2’s time 1 depends on Person 1’s last time, which I am trying to prevent.
If I go with the 3d array I am unable to write the distribution in Stan as \eta \sim MVN(0, \psi), and \psi is a 2 \times 2 covariance matrix.

I have redefined my model as a 3d array. And below is my code

data {
  int N_Y; //number of observed variables
  int N_T; //number of time points
  int N_ID; //number of individuals 
  int N_eta; //number of factors i.e. 2
  vector [N_Y] Y[N_ID,N_T]; //observed variables i.e. Y[N_ID,N_T,N_Y]
}
  
parameters {
    vector[2] eta[N_ID, N_T]; //LVs i.e. eta[N_Id, N_T, 2]
    vector[6] nu; //intercepts
    vector[4] lambda; //loadings
    
    //structural within
    vector[2] alpha; //intercepts
    vector[2] beta_lag; //coefficients
    
    vector<lower = 0>[2] sigma_eta; //factor SD
    vector<lower = 0>[6] sigma_Y; //observed variable SD
    cholesky_factor_corr[2] psiY;
}
 
  
model {
    vector[N_Y] mu_Y[N_ID, N_T]; //expected outcome for observed variables
    vector[N_eta] mu_eta[N_ID, N_T]; //expected outcome for LVs
    //within model
    
    for (i in 1:N_ID){
      //structural model
      //When T=1
      mu_eta[i,1,1] = alpha[1];
      mu_eta[i,1,2] = alpha[2];      
      
      for (t in 2:N_T){
        for (j in 1:N_eta){
          mu_eta[i,t,j] = alpha[j] + beta_lag[j]*eta[i,t-1,j];
        }     
      }
      
      for (t in 1:N_T) {
        
        //measurement model
        mu_Y[i,t,1] = nu[1] + eta[i,t,1];
        mu_Y[i,t,2] = nu[2] + lambda[1]*eta[i,t,1];
        mu_Y[i,t,3] = nu[3] + lambda[2]*eta[i,t,1];
        
        mu_Y[i,t,4] = nu[4] + eta[i,t,2];
        mu_Y[i,t,5] = nu[5] + lambda[3]*eta[i,t,2];
        mu_Y[i,t,6] = nu[6] + lambda[4]*eta[i,t,2];
        
      }
    }
    //Priors
    lambda ~ normal(0,5);
    psiY ~ lkj_corr_cholesky(1);
    sigma_Y ~ normal(0,5);
    nu ~ normal(0,10);
    alpha ~ normal(0,5);
    beta_lag ~ normal(0,5);
    sigma_eta ~ normal(0,5);
    
    
    
    //Within Model
    for (i in 1:N_ID) {
      for (t in 1:N_T) {
        eta[i,t] ~ multi_normal_cholesky(mu_eta[i,t], diag_pre_multiply(sigma_eta, psiY)); 
        
        for (j in 1:N_Y){
          Y[i,t,j]~ normal(mu_Y[i,t,j], sigma_Y[j]);
        }
      }
    }
}

However, anytime I run my code, R crashes with the following error

recompiling to avoid crashing R session
Error in unserialize(socklist[[n]]) : error reading from connection
Error in serialize(data, node$con, xdr = FALSE) : 
  error writing to connection

My simulation code is

N <- 200
Nt <- 20 #Number of time occasions

zeta <- vector("list", length = Nt) #residuals structural model
eta <- vector("list", length = Nt)

#Residuals for Structural Model
for (j in 1:Nt) {
  zeta[[j]] <- mvtnorm::rmvnorm(N, mean = c(0,0), sigma = diag(2))
}
eta_cov <- matrix(c(1, .2,.2,1),2)
b11 <- matrix(c(.7, 0, 0, .4), 2)
b22 <- matrix(c(.19,0,0,.25),2)

#time 1
  
eta[[1]] <- mvtnorm::rmvnorm(N, mean = c(0,0), sigma = eta_cov) + zeta[[1]]
  
 #time 2 & beyond
for (j in 2:Nt) {
  eta[[j]] <- eta[[j-1]]%*%b11 + zeta[[j]]
}


#Measurement model
y <- vector("list", Nt) #observed variables
epsilon1 <- vector("list", Nt)
ld <- matrix(c(1, .5, .7, 0, 0, 0, 0, 0, 0, 1, .2, .3), nrow = 6, byrow = FALSE)

for (j in 1:Nt) {
  #residual
  epsilon[[j]] <- mvtnorm::rmvnorm(N, mean = rep(0,6), sigma = diag(6))
  
  #observed variables
  y[[j]] <- eta[[j]]%*%t(ld) + epsilon[[j]]
}

#convert from list to 3d array
y_new <- array(0, dim = c(N,Nt,6))
for (j in 1:6) {
  for (t in 1:Nt) {
    for(i in 1:N) {
      y_new[i,t,j] = y[[t]][i,j]
    }    
  }  
}

I will like to add that even after crashing, I am able to run a new Stan program with a different model. But this model is always crashing

Do you get any errors back when you run with chains = 1?

Your model runs for me without crashing when using cmdstanR, so that might be another option to try.

On another note, I’ve found with similar models (longitudinal, but not auto-regressive) that it’s more efficient to work with one long vector (items * timepoints) for each individual, rather than nested arrays. This lets you take more advantage of vectorisation in stan (and can make the code a little easier to read).

I’m not sure whether it will be much help, but the code below is for a repeated measures 1-factor model, with correlations between the latent factors over time, so you can see what I mean about the indexing:

data{
  int N;              //Number of individuals
  int J;              //Number of items
  int T;              //Number of timepoints
  vector[J*T] y[N];
}

parameters{
  row_vector[J*T] intercepts;

  /* 'Raw' Variables for Non-Centered Parameterisation */
  matrix<lower=0.0>[J,T] load_r;
  vector[T] eta_r[N];

  /* Hyperparameters for Loadings */
  vector<lower=0>[T] load_m;
  vector<lower=0>[T] load_sd;

  /* Correlations between factors over time and
       item residual variances*/
  cholesky_factor_corr[T] e_Lcorr;
  vector<lower=0>[J*T] resvar;
}

transformed parameters {
  matrix[T,J] load;

  /* Transform 'raw' loadings to required distribution */
  for(j in 1:J)
    load[,j] = (load_m[j] + load_r[j] * load_sd[j])';
}

model{
  /* These are temporary variables, used for holding
       functions of other parameters. More efficient on
       memory to include here (rather than transformed 
       parameters), as their posteriors aren't saved */
  matrix[J*T, T] lam;
  matrix[N,T] eta_t;
  matrix[N,J*T] lam_eta;

  /* Sample factor correlations */
  e_Lcorr ~ lkj_corr_cholesky(1);

  /* Sample intercepts and loading hyperparameters */
  intercepts ~ normal(0,5);
  load_m ~ normal(0,5);
  load_sd ~ normal(0,0.1);

  /* Sample 'raw' loadings for non-centered parameterisation */
  for(j in 1:J)
    load_r[j] ~ std_normal();

  /* Pack loadings into a factor loading matrix with zeros
     for cross-loadings */
  lam[,1] = append_row(load[1]', rep_vector(0, J*(T-1)));
  lam[,T] = append_row(rep_vector(0, J*(T-1)), load[T]');

  for(t in 2:(T-1))
    lam[,t] = append_row(append_row(rep_vector(0, J*(t-1)), load[t]'),
                         rep_vector(0, J*(T-t)));

  /* Sample residual variances */
  resvar ~ std_normal();
  
  /* Sample 'raw' factor scores and transform with multivariate
     non-centered parameterisation */
  for(n in 1:N) {
    eta_r[n] ~ std_normal();
    eta_t[n] = (e_Lcorr*eta_r[n])';
  }

  /* Create the E[Y] for each individual */
  lam_eta = eta_t*lam';

  /* Link to observed data, with a univariate normal distribution
       for each observation */
  for(n in 1:N)
    y[n] ~ normal(intercepts + lam_eta[n], resvar);
}

generated quantities {
  /* Transform factor correlations from Cholesky to Correlation matrix */
  corr_matrix[T] corr = multiply_lower_tri_self_transpose(e_Lcorr);

  /* Put variance parameters into variance scale (from SD) */
  vector[J*T] vars = square(resvar);
  vector[T] loadvars = square(load_sd);
}

Unfortunately, no. RStudio crashes anytime I run with a single chain and I have to restart it.

I will try using cmdstanR. Thanks very much for the suggestion.

Thank you very much for the code. I will try and adapt it to a 2-factor model in a long data format. I hope you don’t mind if I contact you again for help in working out some parts of the code. I am new to Stan and I am still trying to find my feet in Stan