Multi-Ouput Gaussian Process

Hi,

I am trying to model a dynamic factor model such that the factors follow a GP. That is,

Y = \lambda\eta + \epsilon
\eta = f + \zeta; f \sim GP(0, K), \zeta \overset{iid}{\sim} N(0, \sigma_{\zeta})

The length scale parameter, and covariance function of the f and \zeta are not converging.

data{
  int<lower = 1> N_Y;
  int<lower = 1> N_eta;
  int<lower = 1> N;
  
  matrix[N, N_Y] Y;
  real time[N];
}

transformed data {
  real delta = 1e-5;
}

parameters {
  // structural model
  vector<lower = 0>[N_eta] eta_sd;
  matrix[N, N_eta] eta_Z;
  
  // GP
  real<lower=0> rho;
  vector<lower = 0>[N_eta] f1_sd;
  cholesky_factor_corr[N_eta] f1_Lcorr;
  matrix[N, N_eta] gp_Z;
  
  //measurement model
  vector[4] ld;
  real<lower = 0> eps_sig;
  
}

transformed parameters {
  matrix[N, N_eta] eta;
  
  {
    matrix[N, N_eta] f1;
    matrix[N, N] L_K;
    
    matrix[N, N] K = cov_exp_quad(time, 1.0, rho) + diag_matrix(rep_vector(delta, N));
    
    L_K = cholesky_decompose(K);
    
    f1 =  L_K*gp_Z*diag_pre_multiply(f1_sd, f1_Lcorr)';
    
    for (k in 1:N){
      eta[k, ] = f1[k, ] + eta_Z[k, ]*diag_matrix(eta_sd);
    }

  }
  
}

model {
  matrix[N, N_Y] mu_Y;
  
  // Priors
  eta_sd ~ std_normal();
  
  to_vector(eta_Z) ~ std_normal();
  
  rho ~ inv_gamma(2, .5);
  f1_Lcorr ~ lkj_corr_cholesky(1);
  f1_sd ~ std_normal();
  to_vector(gp_Z) ~ std_normal();
  
  ld ~ std_normal();
  eps_sig ~ std_normal();  
  
  mu_Y[, 1] = eta[ ,1];
  mu_Y[, 2] = ld[1]*eta[, 1];
  mu_Y[, 3] = ld[2]*eta[, 1];
  
  mu_Y[, 4] = eta[, 2];
  mu_Y[, 5] = ld[3]*eta[, 2];
  mu_Y[, 6] = ld[4]*eta[, 2];
  
  //for (j in 1:N_Y){
    //target += normal_lpdf(Y[, j] | mu_Y[, j], eps_sig);
  //}
  target += normal_lpdf(to_vector(Y)| to_vector(mu_Y), eps_sig);

}

generated quantities {
  
  cov_matrix[N_eta] f1_cov;
  
  corr_matrix[N_eta] f1_corr = multiply_lower_tri_self_transpose(f1_Lcorr);
  
  f1_cov = quad_form_diag(f1_corr, f1_sd);
}

My R code for simulating the data is

Nt <- 200
N_eta <- 2
NY <- 6

ti <- seq(from = 0, to = 4, length.out = Nt)

set.seed(111)

### Structural Model

#Residuals for Structural Model
phi1 <- matrix(c(1, 0.7,0.7,1), 2, 2)

zeta1_sd <- c(pi/3, pi/5)
zeta1 <- mvtnorm::rmvnorm(Nt, mean = c(0,0), sigma = diag(zeta1_sd, ncol = 2))

# True Value for LV 
f1 <- matrix(nrow = Nt, ncol = N_eta)

f1[,1] <- sin(4*pi*ti) + sin(7*pi*ti)

f1[,2] <- (ti/2)^2

#Latent Variables
#Cholesky decomposition function of desired correlation between eta variables 
L_chol <- matrix(data = c(1, 0, .5, sqrt(1 - .5^2)), nrow = 2, ncol = 2, byrow = TRUE)

#multiply f1 with cholesky decomposed correlation to ensure eta variables are correlated
f1 <- f1%*%L_chol

eta <- f1 + zeta1

### Measurement Model

#Loadings
ld1 <- matrix(c(1, .6, .7, 0, 0, 0, 0, 0, 0, 1, .2, .3), nrow = 6, byrow = FALSE)

#Indicator Variables Residuals
epsilon1 <- mvtnorm::rmvnorm(Nt, mean = rep(0, NY), sigma = diag(1, nrow = NY))

# Indicator Variables
Y1 <- eta%*%t(ld1) + epsilon1

dtw <- list(N = length(ti), time = ti, Y = Y1, N_Y = ncol(Y1), N_eta = N_eta)

Is there a reason you’re constraining the kernel variance to 1.0?

Is this converging without the GP? (i.e., just the factor model by itself)?

I constrained the kernel variance to 1.0 because I was following the stan guidelines for multi-output GPs
https://mc-stan.org/docs/2_26/stan-users-guide/fit-gp-section.html

We should set α to 1.0 because the parameter is not identified unless we constrain trace(C)=1.

However, when I estimate \alpha even without constraining the trace of the covariance matrix to 1, the model parameters converge.

I am not sure about how the factor model part because my aim is to use the semi-parametric GP to model the temporal effects.

Oh, ok. I’m not well-versed in multi-output GPs enough to help you here, sorry.

My first thought though - Build simple and move up. Start with just the latent variable model itself, ensure that it converges.
In my experience, the set-first-loading-to-one approach does not work well in Stan. I tend to recommend orienting all indicators to have positive loadings, then constrain loadings to be positive [I also tend to use the unit-variance identification and estimate all loadings, but it’s not required here].
Others recommend post-processing the loadings and coefficients in GQ{}; basically to force all signs to flip if the first loading is negative. I imagine that approach could cause some major convergence problems when coupled with a GP though.

After getting the factor model working, move onto a single-output GP; then to multi-output.

1 Like