# Indexing Time in a dynamic factor model

I am trying to fit a Bayesian Dynamic Factor Model and I have gone through Jim Savage’s example http://modernstatisticalworkflow.blogspot.com/2016/08/a-brief-introduction-to-dynamic-factor.html for the Dynamic Factor Model and http://modernstatisticalworkflow.blogspot.com/2017/04/hierarchical-vector-autoregression.html for his hierarchical vector autoregressive model.
I am having issues with incorporating a lagged time for the common factors. My model is
Y_{i,t} = \nu + \lambda \eta_{i,t} + \epsilon_{i,t},
where
\eta_{i,t} = \alpha + B\eta_{i,t-1} + \zeta_{it}.
In this case, I have 6 observed variables with 2 factors. That is, a common factor to 3 observed variables only.

data {
int N_Y; //number of observed variables
int N_T; //number of time points
int N_I; //number of individuals
int N; //number of rows
matrix[N, N_Y] Y1; //observed variables
int ID[N]; //Participant ID variable
int Time[N]; //Time variable
}
parameters {
vector[6] nu; //intercept
matrix[N,2] eta; //factor
//OR
real eta1[N_I, N_T,2]; //alternative factor
}


So this is where I am having my struggles, do I specify my factor loadings as a matrix of 3D array to account for the time lag within a person. I am not building a hierarchical model in this case but I need the time lags to only show within a person and not as

for(t in 1:N_T){
for(j in 1:2){
eta[t,j] = alpha[j] + B[j]*eta[t-1,j];
}
}


Because in this case, Person 2’s time 1 depends on Person 1’s last time, which I am trying to prevent.
If I go with the 3d array I am unable to write the distribution in Stan as \eta \sim MVN(0, \psi), and \psi is a 2 \times 2 covariance matrix.

I have redefined my model as a 3d array. And below is my code

data {
int N_Y; //number of observed variables
int N_T; //number of time points
int N_ID; //number of individuals
int N_eta; //number of factors i.e. 2
vector [N_Y] Y[N_ID,N_T]; //observed variables i.e. Y[N_ID,N_T,N_Y]
}

parameters {
vector[2] eta[N_ID, N_T]; //LVs i.e. eta[N_Id, N_T, 2]
vector[6] nu; //intercepts

//structural within
vector[2] alpha; //intercepts
vector[2] beta_lag; //coefficients

vector<lower = 0>[2] sigma_eta; //factor SD
vector<lower = 0>[6] sigma_Y; //observed variable SD
cholesky_factor_corr[2] psiY;
}

model {
vector[N_Y] mu_Y[N_ID, N_T]; //expected outcome for observed variables
vector[N_eta] mu_eta[N_ID, N_T]; //expected outcome for LVs
//within model

for (i in 1:N_ID){
//structural model
//When T=1
mu_eta[i,1,1] = alpha[1];
mu_eta[i,1,2] = alpha[2];

for (t in 2:N_T){
for (j in 1:N_eta){
mu_eta[i,t,j] = alpha[j] + beta_lag[j]*eta[i,t-1,j];
}
}

for (t in 1:N_T) {

//measurement model
mu_Y[i,t,1] = nu[1] + eta[i,t,1];
mu_Y[i,t,2] = nu[2] + lambda[1]*eta[i,t,1];
mu_Y[i,t,3] = nu[3] + lambda[2]*eta[i,t,1];

mu_Y[i,t,4] = nu[4] + eta[i,t,2];
mu_Y[i,t,5] = nu[5] + lambda[3]*eta[i,t,2];
mu_Y[i,t,6] = nu[6] + lambda[4]*eta[i,t,2];

}
}
//Priors
lambda ~ normal(0,5);
psiY ~ lkj_corr_cholesky(1);
sigma_Y ~ normal(0,5);
nu ~ normal(0,10);
alpha ~ normal(0,5);
beta_lag ~ normal(0,5);
sigma_eta ~ normal(0,5);

//Within Model
for (i in 1:N_ID) {
for (t in 1:N_T) {
eta[i,t] ~ multi_normal_cholesky(mu_eta[i,t], diag_pre_multiply(sigma_eta, psiY));

for (j in 1:N_Y){
Y[i,t,j]~ normal(mu_Y[i,t,j], sigma_Y[j]);
}
}
}
}



However, anytime I run my code, R crashes with the following error

recompiling to avoid crashing R session
Error in unserialize(socklist[[n]]) : error reading from connection
Error in serialize(data, node\$con, xdr = FALSE) :
error writing to connection


My simulation code is

N <- 200
Nt <- 20 #Number of time occasions

zeta <- vector("list", length = Nt) #residuals structural model
eta <- vector("list", length = Nt)

#Residuals for Structural Model
for (j in 1:Nt) {
zeta[[j]] <- mvtnorm::rmvnorm(N, mean = c(0,0), sigma = diag(2))
}
eta_cov <- matrix(c(1, .2,.2,1),2)
b11 <- matrix(c(.7, 0, 0, .4), 2)
b22 <- matrix(c(.19,0,0,.25),2)

#time 1

eta[[1]] <- mvtnorm::rmvnorm(N, mean = c(0,0), sigma = eta_cov) + zeta[[1]]

#time 2 & beyond
for (j in 2:Nt) {
eta[[j]] <- eta[[j-1]]%*%b11 + zeta[[j]]
}

#Measurement model
y <- vector("list", Nt) #observed variables
epsilon1 <- vector("list", Nt)
ld <- matrix(c(1, .5, .7, 0, 0, 0, 0, 0, 0, 1, .2, .3), nrow = 6, byrow = FALSE)

for (j in 1:Nt) {
#residual
epsilon[[j]] <- mvtnorm::rmvnorm(N, mean = rep(0,6), sigma = diag(6))

#observed variables
y[[j]] <- eta[[j]]%*%t(ld) + epsilon[[j]]
}

#convert from list to 3d array
y_new <- array(0, dim = c(N,Nt,6))
for (j in 1:6) {
for (t in 1:Nt) {
for(i in 1:N) {
y_new[i,t,j] = y[[t]][i,j]
}
}
}


I will like to add that even after crashing, I am able to run a new Stan program with a different model. But this model is always crashing

Do you get any errors back when you run with chains = 1?

Your model runs for me without crashing when using cmdstanR, so that might be another option to try.

On another note, I’ve found with similar models (longitudinal, but not auto-regressive) that it’s more efficient to work with one long vector (items * timepoints) for each individual, rather than nested arrays. This lets you take more advantage of vectorisation in stan (and can make the code a little easier to read).

I’m not sure whether it will be much help, but the code below is for a repeated measures 1-factor model, with correlations between the latent factors over time, so you can see what I mean about the indexing:

data{
int N;              //Number of individuals
int J;              //Number of items
int T;              //Number of timepoints
vector[J*T] y[N];
}

parameters{
row_vector[J*T] intercepts;

/* 'Raw' Variables for Non-Centered Parameterisation */
vector[T] eta_r[N];

/* Correlations between factors over time and
item residual variances*/
cholesky_factor_corr[T] e_Lcorr;
vector<lower=0>[J*T] resvar;
}

transformed parameters {

for(j in 1:J)
}

model{
/* These are temporary variables, used for holding
functions of other parameters. More efficient on
memory to include here (rather than transformed
parameters), as their posteriors aren't saved */
matrix[J*T, T] lam;
matrix[N,T] eta_t;
matrix[N,J*T] lam_eta;

/* Sample factor correlations */
e_Lcorr ~ lkj_corr_cholesky(1);

intercepts ~ normal(0,5);

for(j in 1:J)

for(t in 2:(T-1))
rep_vector(0, J*(T-t)));

/* Sample residual variances */
resvar ~ std_normal();

/* Sample 'raw' factor scores and transform with multivariate
non-centered parameterisation */
for(n in 1:N) {
eta_r[n] ~ std_normal();
eta_t[n] = (e_Lcorr*eta_r[n])';
}

/* Create the E[Y] for each individual */
lam_eta = eta_t*lam';

/* Link to observed data, with a univariate normal distribution
for each observation */
for(n in 1:N)
y[n] ~ normal(intercepts + lam_eta[n], resvar);
}

generated quantities {
/* Transform factor correlations from Cholesky to Correlation matrix */
corr_matrix[T] corr = multiply_lower_tri_self_transpose(e_Lcorr);

/* Put variance parameters into variance scale (from SD) */
vector[J*T] vars = square(resvar);