Help with Multivariate Autoregressive Model with random effects

Hello all,

I have been trying to fit a spatially-explicit Multivariate Autoregressive Model of order P with covariates and random effects to model recruitment time series for several species. The model has the following specification:

Y_{t,g,s} = \phi_{0_{g,s}} + \sum_{l=1}^{P}\phi_{l,g,s}(Y_{t-1,g,s } - \phi_{0_{g,s}}) + \gamma_{g,k}X_{t} +\mu_{t,g,s}

where Y_{t,g} is an array with G columns for each g-th species and T rows denoting time, \phi_{0_{g,s}} is a random intercept for species G and each s-th station (the spatial component of the model), \phi_{l,g,s} is the autoregressive parameter, \gamma_{g,k} are coefficients for the effects of each k-th covariate for each g-th species, X_{t} is a design matrix of covariates, and \mu_{t,g,s} is the random innovation term. The corresponding model in Stan syntax can be found below:


data {
  int<lower=0> T;     // TIME SERIES LENGTH
  int<lower=0> G;     // NUMBER OF SPECIES
  int<lower=0> S;     // NUMBER OF STATIONS
  int<lower=1> P;     // LAG ORDER
  int<lower=1> K;     // NUMBER OF COVARIATES
  vector[G]    Y[T];  // OBSERVATIONS, ARRAY WITH G COLUMNS AND T ROWS
  vector[K]    X[T];  // COVARIATES, ARRAY WITH K COLUMNS AND T ROWS
  int    station[T];
}
transformed data {
  vector[G] Yobs[T-P];
  for (t in 1:(T-P)) {
    Yobs[t] = Y[t+P];
  }
}
parameters {
  vector[G]               phi0[S];    // INTERCEPT, ARRAY WITH G COLUMNS AND S ROWS         
  matrix[G,G]             phi[P,S];   // AUTOREGRESSIVE PARAMETER
  cholesky_factor_corr[G] rho[S];     // CORRELATION PARAMETER, ARRAY WITH G COLUMNS AND S ROWS 
  vector<lower=0>[G]      sigma[S];   // NOISE, ARRAY WITH G COLUMNS AND S ROWS 
  matrix[G,K]             gamma;      // COVARIATE EFFECTS, S-DIMENSIONAL ARRAY OF G x K MATRICES
}
transformed parameters {
  matrix[G,G] SIGMA[S];  // VARIANCE-COVARIANCE MATRIX
  for (s in 1:S) {
    SIGMA[s] = diag_pre_multiply(sigma[s], rho[s]);
  }
  vector[G] mu[T-P,S];
  for (t in 1:(T-P)) {
    for (s in 1:S) {
      mu[t,s] = phi0[station[s]] + gamma*X[t+P];
      for (p in 1:P) {
        mu[t,s] = mu[t,s] + phi[p, station[s]]*Y[t+p-1];
      }
    }
  }
}
model {
  for (s in 1:S) {
    rho[s]   ~ lkj_corr_cholesky(2.0);
    sigma[s] ~ cauchy(0,1);
    phi0[s]  ~ normal(0,1);
  }
  to_vector(gamma) ~ normal(0,1);
  for (p in 1:P) {
    for (s in 1:S) {
      to_vector(phi[p,s]) ~ normal(0,1);
      Yobs[s] ~ multi_normal_cholesky(mu[s], SIGMA[s]);
    }
  }
}
generated quantities {
  vector[G] Yhat[T-P,S];
  for (t in 1:(T-P)) {
    for (s in 1:S) {
      Yhat[s] = multi_normal_cholesky_rng(mu[s], SIGMA[s]);
    }
  }
}


The problems I am having are (1) the model with random effects takes an absurdly long time to run (~ 20 hours for just 2 species at 10 stations and 2 covariates), and (2) Stan tells me there is a fair deal of divergent transitions, and the effective number of samples for some parameters (specially the sigmas) is pretty low even after 10K iterations and 1K warmup, which indicates I might have to re-parameterize the model. My question is: am I specifying the random effects correctly? i.e. I specify the random effects for station by creating a station variable that will work as an index, but is there any other way of specifying this?

For contrast, I have a version of the model WITHOUT the random structure for station, which actually works perfectly fine and runtimes are pretty quick even for multiple species, see below:


data {
  int<lower=0> T;     // TIME SERIES LENGTH
  int<lower=0> G;     // NUMBER OF SPECIES
  int<lower=1> P;     // LAG ORDER
  int<lower=1> K;     // NUMBER OF COVARIATES
  vector[G]    Y[T];  // OBSERVATIONS, ARRAY WITH G COLUMNS AND T ROWS
  vector[K]    X[T];  // COVARIATES, ARRAY WITH K COLUMNS AND T ROWS
}
transformed data {
  vector[G] Yobs[T-P];
  for (t in 1:(T-P)) {
    Yobs[t] = Y[t+P];
  }
}
parameters {
  vector[G]               phi0;     // INTERCEPT           
  matrix[G,G]             phi[P];   // AUTOREGRESSIVE PARAMETER
  cholesky_factor_corr[G] rho;      // CORRELATION PARAMETER
  vector<lower=0>[G]      sigma;    // NOISE
  matrix[G,K]             gamma;    // COVARIATE EFFECTS
}
transformed parameters {
  matrix[G,G] SIGMA;  // VARIANCE-COVARIANCE MATRIX
  SIGMA = diag_pre_multiply(sigma, rho);
  vector[G] mu[T-P];
  for (t in 1:(T-P)) {
    mu[t] = phi0 + gamma*X[t+P];
    for (p in 1:P) {
      mu[t] = mu[t] + phi[p]*Y[t+p-1];
    }
  }
}
model {
  rho ~ lkj_corr_cholesky(2.0);
  sigma ~ cauchy(0,1);
  phi0 ~ normal(0,1);
  for (p in 1:P) {
    to_vector(phi[p]) ~ normal(0,1);
    Yobs ~ multi_normal_cholesky(mu, SIGMA);
  }
}
generated quantities {
  vector[G] Yhat[T-P];
  for (t in 1:(T-P)) {
   Yhat = multi_normal_cholesky_rng(mu, SIGMA);
  }
}


Here’s some code to prepare data from the MARSS package that should help you test the model should it be needed:

require(MARSS)
fulldat <- lakeWAplanktonTrans 
years <- fulldat[, "Year"] >= 1965 & fulldat[, "Year"] < 1975 
dat <- t(fulldat[years, c("Greens", "Bluegreens")]) 
the.mean <- apply(dat, 1, mean, na.rm = TRUE) 
the.sigma <- sqrt(apply(dat, 1, var, na.rm = TRUE)) 
dat <- (dat- the.mean) * (1 / the.sigma)

covariates <- rbind(Temp = fulldat[years, "Temp"],
                    TP = fulldat[years, "TP"] ) 
covariates <- zscore(covariates)

Y = t(dat)
X = t(covariates)
data = as.data.frame(cbind(Y,X))
data = na.exclude(data)
Y = as.matrix(data[,2])
X = as.matrix(data[,3:4])
station = as.vector(rbind(rep(1, length(Y[,1])/2), rep(2, length(Y[,1])/2)))
dataList = list(Y = Y,
                T = dim(Y)[1],
                X = X,
                G = dim(Y)[2],
                P = 1,
                K = dim(X)[2],
                S = length(unique(station)),
                station = station)

Thank you in advance for your help!

The form for the random effects look OK, but this loop feels wrong:

Here, you are going to be executing Yobs[s] ~ multi_normal_cholesky(mu[s], SIGMA[s]); a total of P times. I think you want to rewrite this as

for (s in 1:S) {
  Yobs[s] ~ multi_normal_cholesky(mu[s], SIGMA[s]);
  for (p in 1:P) {
    to_vector(phi[p, s]) ~ std_normal();
  }
}

If P is at all large you’ll be getting the wrong answer (counting each Yobs a total of P times) and running much more slowly.

You also don’t need to save SIGMA as a transformed parameter—it’ll produce a lot of I/O, which is slow, especially on clusters. It’s better to recreate it in generated quantities in terms of overall speed—computation in generated quantities is super fast because it’s just primitive values rather than autodiff.

You can get some more speedups by vectorizing some of the code in transformed parameters, but that’ll be minor.

The random effect form looks OK, but I can’t tell form a quick look if all the bounds are right with the T-P and t + P and so on. I you can somehow make those more naturally indexed it’d help. You can code it all to make things more efficient.

The main thing to do is to reduce the number of calls to multi_normal_cholesky.

Thank you for your suggestions! Just placing the call for multi_normal_cholesky likelihood outside the P loop seemed to work pretty well for speeding up the model - now it runs in a fraction of the time, even with multiple stations and species.

Best,
Matheus