Convergence problems with factor loadings in multivariante autoregressive model

Hi everyone,

I am trying to implement a MAR/VAR model by Ovaskainen et al. 2017. I have simplified the model by assuming species only interact with individuals of their own species (i.e. no interspecific interactions). I have focused on estimating the variance-covariance matrix assuming a multiplicative gamma prior on the factor loadings (Bhattacharya and Dunson 2011). I have used some code from the Stan forum, mainly from here.

Here is the stan code:

data {
  int<lower=0> sp;  // Number of species
  int<lower=0> T;   // Numer of time steps
  int <lower=0> nf; // Number of latent variables
  vector[sp] Y[T];   // The outcome matrix

}
parameters {
  real<lower=0> a1;   
  real<lower=0> a2; 
  vector[sp] r;     
  vector[sp] a;    
  vector<lower=0>[nf] delta; 
  matrix<lower=0>[sp,nf] phi;
  matrix[sp,nf] lam;
  vector<lower=0>[sp] s2i;
}
transformed parameters{
  vector<lower=0>[nf] theta;
  vector<lower=0>[sp] s2;
  matrix[sp,sp] omega;
  
  theta[1] = delta[1];
  for(l in 2:nf)
     theta[l] = theta[l-1] * delta[l];
     
  s2 = 1.0 ./ s2i;
  
  omega = diag_matrix(s2) + lam *lam';
}
model {
  // priors
  a1 ~ gamma(2.0, 1.0);
  a2 ~ gamma(2.0, 1.0);
  r ~ normal(0,1);
  a ~ normal(0,1);
  
  delta[1] ~ gamma(a1, 1.0);
  for (h in 2:nf)
      delta[h] ~ gamma(a2, 1.0);

  for (s in 1:sp)
      for (l in 1:nf)
          phi[s, l] ~ gamma(3.0 / 2.0, 3.0 / 2.0);

  for (s in 1:sp)
      for (l in 1:nf)
          lam[s, l] ~ normal(0.0, 1.0 / phi[s, l] / theta[l]);

  for (s in 1:sp)
      s2i[s] ~ gamma(1.0, 0.3);
  
  //Likelihood
  for (t in 2:T)
      Y[t] ~ multi_normal(r + diag_matrix(a) * Y[t-1], omega);
}
generated quantities{
  vector[sp] sde;
  matrix[sp, sp]Rho;
  sde = sqrt(s2);
  for(i in 1:sp)
    for(j in 1:sp)
      Rho[i,j] = omega[i,j]/sqrt(omega[i,i] * omega[j,j]);
}

And here is the R code simulating the data and fitting the model

# Clean the environment
rm(list = ls())

# Load libraries
library(rstan)
library(mvtnorm)
library(rethinking)
library(truncnorm)

# simulate time series following a gompertz curve. Simulates species biomass on a log scale
Gompertz.sim <- function(init.pop, r, alpha, time, sp, Rho, sd_noise){
  metacommunity <- matrix(NA, nrow = time, ncol = sp)
  metacommunity[1,] <- init.pop 
  for (t in 2:time) {
    metacommunity[t,]<- r + alpha*metacommunity[t-1,] + t(rmvnorm2(1, Mu = rep(0, sp), sigma = diag(sd_noise), Rho = Rho))
  }
  return(metacommunity)
}


P0<- 1            # Initial population biomass for all species on log scale
sp <- 10           # Number of species
t <- 200          # Number of timesteps 
r <- rnorm(sp, 0.65, 0) # intrinsict grwoth rate for all species
alpha <- rtruncnorm(sp, a = -Inf, b = 0.95, mean = 0.94, sd = 0.1) # Density dependence for all sp

# Process error standard deviation for all sp
sd_noise <- matrix(0, nrow = sp, ncol = sp)   
diag(sd_noise) <- rlnorm(sp, -1.5, 0.1)       

Rho <- rlkjcorr(1, sp, 2) # Correlation matrix
Sigma <- sd_noise%*%Rho%*%sd_noise # variance-covariance matrix

# Simulate the community dynamics 
Sim.data <- Gompertz.sim(init.pop = P0, r = r, alpha = alpha, time = t, sp = sp, sd_noise = sd_noise, Rho = Rho)

# Prepare the stan data
dat<-list(Y = Sim.data, 
          sp = ncol(Sim.data), 
          T=nrow(Sim.data),
          nf = 2)

# Estimate the posterior
MAR_model <- stan(file = "MAR_Ovas.stan",
                  data = dat,
                  chains = 4, 
                  cores = 4,
                  iter = 2000,
                  control = list(adapt_delta =0.9))

The model recovers the true parameter fairly well. All the parameters seem to converge (good Rhat, n_eff and traceplots) except for the factor ladings lam, which do fine if I increase the number of iterations from 2000 to 15000.

I am new to Stan and latent variable models. Any ideas how to make my model specification more efficient? Or, any other suggestion on how to estimate the factor loadings using other priors?

Thanks,
Alfonso

Take a look at this post Help with factor analysis (latent variable model) and the latent factor model in the Stan manual. That post references @rfarouni blog http://rfarouni.github.io/2015-04-26-fa/ which has a lot more detail.

That post shows how to write the multi normal in terms of cholesky factors. Also you can remove some of the double loops by vectorization such as

// your formulation
  for (s in 1:sp)
      for (l in 1:nf)
          phi[s, l] ~ gamma(3.0 / 2.0, 3.0 / 2.0);

// faster, this takes phi[s] as a vector length nf and places
// the gamma prior on each element of the vector
  for (s in 1:sp)
  phi[s] ~ gamma(3.0 / 2.0, 3.0 / 2.0);
1 Like

Thanks, that helped a lot.

Here is the stan model:

data {
  int<lower=1> S;  // Number of Secies
  int<lower=1> T;   // Numer of years
  int <lower=1> D; // Number of latent dimensions
  vector[S] Y[T];   // The outcome matrix
}
transformed data{
  int<lower=1> M;
  M  = D*(S-D)+ D*(D-1)/2;  // number of non-zero loadings
}
parameters {
  vector[S] r;   // Vector of intrinsic growth rates
  vector[S] a;   // matrix with competitive parameter
  vector[M] L_t;   // lower diagonal elements of L
  vector<lower=0>[D] L_d;   // lower diagonal elements of L
  vector<lower=0>[S] psi;         // vector of variances
  real<lower=0>   mu_psi;
  real<lower=0>  sigma_psi;
  real   mu_lt;
  real<lower=0>  sigma_lt;
}
transformed parameters{
  cholesky_factor_cov[S,D] L;  //lower triangular factor loadings Matrix 
  cov_matrix[S] Omega;   //Covariance mat
{
  int idx;
  idx = 0;
  for(i in 1:(D-1)){
    for(j in (i+1):D){
      L[i,j] = 0; //constrain the upper triangular elements to zero 
    }
  }
  for(j in 1:D) {
    L[j,j] = L_d[j];
    for(i in (j+1):S) {
      idx = idx + 1;
      L[i,j] = L_t[idx];
    } 
  }
} 
Omega=L*L'+diag_matrix(psi);
}
model {
// the hyperpriors 
  mu_psi ~ cauchy(0, 1);
  sigma_psi ~ cauchy(0,1);
  mu_lt ~ cauchy(0, 1);
  sigma_lt ~ cauchy(0,1);
// the priors 
  L_d ~ cauchy(0,3);
  L_t ~ cauchy(mu_lt,sigma_lt);
  psi ~ cauchy(mu_psi,sigma_psi);
  r ~ normal(0,1);
  a ~ normal(0,1);
  
//The likelihood
  for (t in 2:T)
      Y[t] ~ multi_normal(r + diag_matrix(a) * Y[t-1], Omega);
}
generated quantities{
  matrix[S, S]Rho;
  vector[S] sde;
  for(i in 1:S)
    for(j in 1:S)
      Rho[i,j] = Omega[i,j]/sqrt(Omega[i,i] * Omega[j,j]);
  
  sde = sqrt(diagonal(Omega));
}

The model runs well, but I wonder if there is any changes that can be done to increase the efficiency.

Use multi_normal_cholesky directly since you have L and psi. See the above discussion at Help with factor analysis (latent variable model) or search the manual for cholesky parameterization. You can also set the lagged Y in transformed data and have a vectorized call to the multi_normal_cholesky.

For what I understood \Sigma=LL^T+\Psi where L is the matrix of factor loadings with the constrains from @rfarouni blog. In Help with factor analysis (latent variable model) they used multi_normal_cholesky for the factor scores, not for the factor loadings. The Ld in their code is the product between the factor scores sd and the factor scores correlation, whereas the L in my code is the factor loadings, what is defined as lambda in their code. So you are suggesting to write

model{
    multi_normal_cholesky(r + diag_matrix(a) * Y[t-1], L)
}

In this case I don’t know what to do with psi

Or

transformed parameters{
    L_s = diag_pre_multiply(psi, L)
}
model {
    multi_normal_cholesky(r + diag_matrix(a) * Y[t-1], L_s)
}

But L is not from a correlation matrix

I am sure I am missing something conceptually.

And thanks a lot for your help,
Alfonso

I think just something like

transformed parameters {
...
Omega = add_diag(multiply_lower_tri_self_transpose(L), psi);
L_Omega = cholesky_decompose(Omega);
...
}
model { 
...
 for (t in 2:T)
      Y[t] ~ multi_normal_cholesky(r + diag_pre_multiply(a, Y[t-1]), L_Omega);
...
}|

Hey :)

You should definitly look at Mauricio’s code in this thread :

It has been of great help to me!

Lucas

Hey,

It seems add_diag has not been included into rstan, but it is available on Stan math library. Any idea on how to load that into my workspace?

So instead of:

I did:

transformed parameters {
...
  Omega = L*L'+diag_matrix(psi);
  L_Omega = cholesky_decompose(Omega);
...
}
model { 
...
  for (t in 2:T)
      Y[t] ~ multi_normal_cholesky(r + diag_matrix(a)*Y[t-1], L_Omega);
...
}

The model runs faster than before. However, I realised that for both cases multi_normal and multi_normal_cholesky, the model has low n_eff and Rhat for the L when I increase the number of species or the the number of latent dimensions. L Traceplots:

It looks like some of the L are bimodal. I used the same constrains for the factor loadings L as @ldeschamps. So the difference I see is that you model the latent factors FS_UNC with L_UNC to calculate Mu, while I use the factor loadings and psi to estimate the variance-covariance matrix. Is that right? If I am to include FS_UNC in my model I am not sure how to use it. should I

...
transformed parameters{
matrix[T,S] Mu;
...
for(t in 2:N) Mu [t] = r + diag_matrix(a) * Y[t-1] + FS_UNC * L_UNC';
}

If so I am not sure what argument use in multi_normal

...
  for (t in 2:T)
      Y[t] ~ multi_normal(Mu[t], ????);
}

Thanks a lot for the help :)

Cheers,
Alfonso