Efficiency issue with Gaussian Process model

Hello,

I am working on a gaussian process model in which I am adding structure from a correlation matrix (vcv in code below) to a model. The model has been tested and validated using test data, and while sampling is quick for small datasets, it is impossibly slow when I try to scale up (increase Nspp, see code below). I am looking for any ideas how to make the model more computationally efficient so I can use it for larger datasets.

My colleagues and I think that the model may be struggling with iterating through a larger matrix, but that’s just our current guess, so alternative hypotheses are most welcome.

Any suggestions for how to improve the model efficiency would be greatly appreciated!

Below is code for the Stan model:


functions {
  matrix lambda_vcv(matrix vcv, real lambda, real sigma){
    matrix[rows(vcv),cols(vcv)] local_vcv;
    matrix[rows(vcv),cols(vcv)] sigma_mat;
    local_vcv = vcv * lambda;
    for(i in 1:rows(local_vcv))
      local_vcv[i,i] = vcv[i,i];
    sigma_mat = diag_matrix(rep_vector(sigma, rows(vcv)));
    return(sigma_mat * local_vcv * sigma_mat);
  }
}

data {
  int<lower=1> N;
  int<lower=1> Nspp;
  int<lower=1, upper= Nspp > species[N];
  vector[N] ypred;         // response
  vector[N] year;     // predictor
  matrix[Nspp, Nspp] Vphy;     //

  // Priors
  real a_z_prior_mu;
  real a_z_prior_sigma;
  real lam_interceptsa_prior_alpha;
  real lam_interceptsa_prior_beta;
  real sigma_interceptsa_prior_mu;
  real sigma_interceptsa_prior_sigma;
  real b_z_prior_mu;
  real b_z_prior_sigma;
  real lam_interceptsb_prior_alpha;
  real lam_interceptsb_prior_beta;
  real sigma_interceptsb_prior_mu;
  real sigma_interceptsb_prior_sigma;
  real sigma_y_prior_mu;
  real sigma_y_prior_sigma;
}

parameters {
  real<lower=0> sigma_y;
  real<lower=0, upper=1> lam_interceptsa;
  real<lower=0> sigma_interceptsa;
  real<lower=0, upper=1> lam_interceptsb;
  real<lower=0> sigma_interceptsb;
  vector[Nspp] b; // slope
  real b_z;
  vector[Nspp] a; // intercept
  real a_z;
      }

model {
  real yhat[N];
  for(i in 1:N){
    yhat[i] =
      a[species[i]] + b[species[i]] * year[i];     }
  a ~ multi_normal(rep_vector(a_z,Nspp), lambda_vcv(Vphy, lam_interceptsa, sigma_interceptsa));
  b ~ multi_normal(rep_vector(b_z,Nspp), lambda_vcv(Vphy, lam_interceptsb, sigma_interceptsb));
  ypred ~ normal(yhat, sigma_y);

 // Prior
  a_z ~ normal(a_z_prior_mu, a_z_prior_sigma);
  lam_interceptsa ~ beta(lam_interceptsa_prior_alpha, lam_interceptsa_prior_beta);
  sigma_interceptsa ~ normal(sigma_interceptsa_prior_mu, sigma_interceptsa_prior_sigma);
  b_z ~ normal(b_z_prior_mu, b_z_prior_sigma);
  lam_interceptsb ~ beta(lam_interceptsb_prior_alpha, lam_interceptsb_prior_beta);
  sigma_interceptsb ~ normal(sigma_interceptsb_prior_mu, sigma_interceptsb_prior_sigma);
  sigma_y ~ normal(sigma_y_prior_mu, sigma_y_prior_sigma);

}


The R code to generate our test data is the following:


Nyear <- 10
Nstudy <- 30
Nspp <- 1000
N <- Nspp * Nstudy * Nyear

m <- 0.6
lam <- 0.7 # lambda
sigma_y <- 0.01 # sigma_y
sigma_interceptsb <- 0.1

#generate a tree
spetree <- pbtree(n = Nspp, nsim = 1, b = 1, complete = FALSE, scale = 1)
spetree$tip.label <- paste("s", 1:Nspp, sep = "")
scaledtree <- rescale(spetree, model = "lambda", lam)
slopez <- fastBM(scaledtree, a = m, mu = 0, sig2 = sigma_interceptsb ^ 2)
phylosig(x = slopez, tree = spetree, method = "lambda")
vcvtree <- vcv(spetree, corr = TRUE)

dat <- data.frame(matrix(NA, N, 1))
names(dat) <- c("year")
dat$year <- c(1:Nyear)
dat$study <- rep(c(1:Nstudy), each = Nspp)
dat$species <- rep(1:Nspp, Nstudy)
dat$slopes <-rnorm(Nspp, 5, 3)
dat$smpl.int <- rtruncnorm(Nspp, 0, 365, 188, 20)
dat$doy <- as.integer(dat$year*dat$slopes + dat$smpl.int)

phypriors <- list(a_z_prior_mu = 150,
                  a_z_prior_sigma = 50,
                  astudy_prior_mu = 0,
                  astudy_prior_sigma = 50,
                  lam_interceptsa_prior_alpha = 1,
                  lam_interceptsa_prior_beta = 1,
                  sigma_interceptsa_prior_mu = 40,
                  sigma_interceptsa_prior_sigma = 20,
                  b_z_prior_mu = 0,
                  b_z_prior_sigma = 10,
                  lam_interceptsb_prior_alpha = 1,
                  lam_interceptsb_prior_beta = 1,
                  sigma_interceptsb_prior_mu = 5,
                  sigma_interceptsb_prior_sigma = 5,
                  sigma_y_prior_mu = 20,
                  sigma_y_prior_sigma = 10
)


simu_inits <- function(chain_id) {
  a_z.temp <- rnorm(n = Nspp, mean = phypriors[["a_z_prior_mu"]], sd = phypriors[["a_z_prior_sigma"]])
  b_z.temp <- rnorm(n = Nspp, mean = phypriors[["b_z_prior_mu"]], sd = phypriors[["b_z_prior_sigma"]])
  return(append(list(a = a_z.temp,
                     b_year = b_z.temp),
                phypriors))
}

datalist <- append(list(N = nrow(dat),
                        Nspp = length(unique(dat$species)),
                        ypred = dat$doy,
                        species = dat$species,
                        year = dat$year,
                        Vphy = vcvtree), phypriors)

Thanks!

Hi,

Have you tried cholesky trick mentioned in 10.3 Fitting a Gaussian process | Stan User’s Guide

1 Like

Thanks @ahartikainen.

I have tried adding a Cholesky decomposition (Stan code below). But it does not seem to have helped much for large Nspp values.

Do you know of any other tricks that might be used in addition to or instead of the Cholesky decomposition?

I appreciate your help!

functions {
  matrix lambda_vcv(matrix vcv, real lambda, real sigma){
    matrix[rows(vcv),cols(vcv)] local_vcv;
    matrix[rows(vcv),cols(vcv)] sigma_mat;  
    local_vcv = vcv * lambda;
    for(i in 1:rows(local_vcv))
      local_vcv[i,i] = vcv[i,i];
    sigma_mat = diag_matrix(rep_vector(sigma, rows(vcv)));
    return(sigma_mat * local_vcv * sigma_mat);
  }
}

data {
  int<lower=1> N;
  int<lower=1> Nspp;
  int<lower=1, upper= Nspp > species[N];
  vector[N] ypred; 		// response
  vector[N] year; 	// predictor 
  matrix[Nspp, Nspp] Vphy;     // 

  // Priors
  real a_z_prior_mu; 
  real a_z_prior_sigma;
  real lam_interceptsa_prior_alpha;
  real lam_interceptsa_prior_beta;
  real sigma_interceptsa_prior_mu;
  real sigma_interceptsa_prior_sigma;
  real b_z_prior_mu;
  real b_z_prior_sigma;
  real lam_interceptsb_prior_alpha;
  real lam_interceptsb_prior_beta;
  real sigma_interceptsb_prior_mu;
  real sigma_interceptsb_prior_sigma;
  real sigma_y_prior_mu;
  real sigma_y_prior_sigma;  	
}

parameters {
  real<lower=0> sigma_y;    
  real<lower=0, upper=1> lam_interceptsa;       
  real<lower=0> sigma_interceptsa;
  real<lower=0, upper=1> lam_interceptsb;       
  real<lower=0> sigma_interceptsb;   
  vector[Nspp] b; // slope 
  real b_z;
  vector[Nspp] a; // intercept
  real a_z;
  	}

model {
  real yhat[N];
  for(i in 1:N){
    yhat[i] = 
      a[species[i]] + b[species[i]] * year[i]; 			     	}
  a ~ multi_normal_cholesky(rep_vector(a_z,Nspp), cholesky_decompose(lambda_vcv(Vphy, lam_interceptsa, sigma_interceptsa))); 
  b ~ multi_normal_cholesky(rep_vector(b_z,Nspp), cholesky_decompose(lambda_vcv(Vphy, lam_interceptsb, sigma_interceptsb))); 
  ypred ~ normal(yhat, sigma_y);
 
 // Prior
  a_z ~ normal(a_z_prior_mu, a_z_prior_sigma);
  lam_interceptsa ~ beta(lam_interceptsa_prior_alpha, lam_interceptsa_prior_beta);
  sigma_interceptsa ~ normal(sigma_interceptsa_prior_mu, sigma_interceptsa_prior_sigma);
  b_z ~ normal(b_z_prior_mu, b_z_prior_sigma);
  lam_interceptsb ~ beta(lam_interceptsb_prior_alpha, lam_interceptsb_prior_beta);
  sigma_interceptsb ~ normal(sigma_interceptsb_prior_mu, sigma_interceptsb_prior_sigma);
  sigma_y ~ normal(sigma_y_prior_mu, sigma_y_prior_sigma);

}

If I read your code correctly, Vphy is constant matrix, and you scale the offdiagonal elements by lambda and multiply from left and right with diagonal matrix. If so, then you should be able to compute the Cholesky only once for Vphy. Then during the sampling the scaling with lambda is just a scalar-matrix multiplication, and you can use Woodbury matrix identity - Wikipedia to handle the fact that the diagonal is not scaled as the off-diagonals elements, and don’t build sigma_mat as a matrix as you can do that part also cheaper. The part of doing Cholesky only once would provide the biggest speedup.

3 Likes

Thank you @avehtari for this useful suggestion!

We received similar advice from Ben Goodrich and have been working to incorporate it. This approach does shorten the run time substantially, but it generates ~4000 transitions that exceeded the max tree depth. Running the model on the below test data produces good estimates for the slope and intercept, but poor estimates for the lambda and sigma of both the intercept and slope (see summary output below). Is there an issue with how I am computing the Cholesky for Vphy?

It is also unclear to me how I would code a Woodbury matrix identity in Stan. Would you have any sample code you would be able to share? Although, this addition might not be necessary if I could get the current code to generate better estimates, since it is sufficiently fast.

Here is my revised Stan code to compute the Vphy matrix once:

functions {
  matrix lambda_vcv(matrix vcv, real lambda, real sigma){
    matrix[rows(vcv),cols(vcv)] local_vcv;
    matrix[rows(vcv),cols(vcv)] sigma_mat;  
    local_vcv = vcv * lambda;
    for(i in 1:rows(local_vcv))
      local_vcv[i,i] = vcv[i,i];
    sigma_mat = diag_matrix(rep_vector(sigma, rows(vcv)));
    return(sigma_mat * local_vcv * sigma_mat);
  }
}

data {
  int<lower=1> N;
  int<lower=1> n_sp;
  int<lower=1, upper= n_sp> sp[N];
  vector[N] y; 		// response
  vector[N] x1; 	// predictor
  matrix[n_sp,n_sp]Vphy;     //matrix
  // Priors
  real a_z_prior_mu;
  real a_z_prior_sigma;
  real lam_interceptsa_prior_alpha;
  real lam_interceptsa_prior_beta;
  real sigma_interceptsa_prior_mu;
  real sigma_interceptsa_prior_sigma;
  real b_zf_prior_mu;
  real b_zf_prior_sigma;
  real lam_interceptsbf_prior_alpha;
  real lam_interceptsbf_prior_beta;
  real sigma_interceptsbf_prior_mu;
  real sigma_interceptsbf_prior_sigma;
  real sigma_y_mu_prior;

}

transformed data {
  matrix[n_sp, n_sp] L = cholesky_decompose(Vphy);
}

parameters {
  real<lower=0> sigma_y;    
  real<lower=0, upper=1> lam_interceptsa;       
  real<lower=0> sigma_interceptsa;
  real<lower=0, upper=1> lam_interceptsbf;       
  real<lower=0> sigma_interceptsbf;    
  real b_zf;
  real a_z;
  vector[n_sp] a_;
  vector[n_sp] b_;
	}

transformed parameters {
  vector[n_sp] a = a_z + sqrt(lam_interceptsa) * sigma_interceptsa * (L * a_); 
  vector[n_sp] b = b_zf + sqrt(lam_interceptsbf) * sigma_interceptsbf * (L * b_); 
}


model {
       real yhat[N];
       	for(i in 1:N){
            yhat[i] = 
		a[sp[i]] + b[sp[i]] * x1[i];

		}

  y ~ normal(yhat, sigma_y);
  
  // Priors
  a_z ~ normal(a_z_prior_mu, a_z_prior_sigma);
  lam_interceptsa ~ beta(lam_interceptsa_prior_alpha, lam_interceptsa_prior_beta);
  sigma_interceptsa ~ normal(sigma_interceptsa_prior_mu, sigma_interceptsa_prior_sigma);
  b_zf ~ normal(b_zf_prior_mu, b_zf_prior_sigma);
  lam_interceptsbf ~ beta(lam_interceptsbf_prior_alpha, lam_interceptsbf_prior_beta);
  sigma_interceptsbf ~ normal(sigma_interceptsbf_prior_mu, sigma_interceptsbf_prior_sigma);
  sigma_y ~ normal(sigma_y_mu_prior, sigma_y_mu_prior);

 a_ ~ normal(0, 1);
  b_ ~ normal(0, 1);

}

And the accompanying test code and summary output:

require(ape)
require(geiger)
require(phytools)
require(rstan)

nspecies = 200
nind = 10

spetree <- pbtree(n=nspecies, nsim=1, b=1, complete=FALSE,scale=1)
spetree$tip.label <- paste("s", 1:nspecies, sep="")

# Set up the trait parameters
param <- list(a_z = 4, # root value intercept
              lam_interceptsa = 0.4, 
              sigma_interceptsa = 0.2, 
              b_zf = 0.6,
              lam_interceptsbf = 0.7, 
              sigma_interceptsbf = 0.1, 
              sigma_y = 0.01 
              )
# Set priors
phypriors <- list(
    a_z_prior_mu = 4, # true value
    a_z_prior_sigma = 1,
    lam_interceptsa_prior_alpha = 4, # 
    lam_interceptsa_prior_beta = 6, # 
    sigma_interceptsa_prior_mu = 0.2, # true value
    sigma_interceptsa_prior_sigma = 0.2,
    b_zf_prior_mu = 0.6, # true value
    b_zf_prior_sigma = 1,
    lam_interceptsbf_prior_alpha = 7, #
    lam_interceptsbf_prior_beta = 3, # 
    sigma_interceptsbf_prior_mu = 0.1, # true value
    sigma_interceptsbf_prior_sigma = 0.1,
    sigma_y_mu_prior = 0.01
    
)

# Generate intercept
scaledtree_intercept <- rescale(spetree, model = "lambda", param[["lam_interceptsa"]])         
intercepts <- fastBM(scaledtree_intercept, a = param[["a_z"]], mu = 0, sig2 = param[["sigma_interceptsa"]] ^ 2)
# Generate bf slope
scaledtree_bf <- rescale(spetree, model = "lambda", param[["lam_interceptsbf"]])         
slopes_bf <- fastBM(scaledtree_bf, a = param[["b_zf"]], mu = 0, sig2 = param[["sigma_interceptsbf"]] ^ 2)

dfhere <- data.frame(sp = c(), intercept = c(), x1=numeric(), trait1=numeric())
for (i in 1:nspecies){
    temp <- data.frame(sp = rep(i, nind),
                       intercept = rep(intercepts[i], nind),
                       x1 = rnorm(n = nind, mean = 10, sd = 3),
                       trait1 = rep(slopes_bf[i], nind))
    dfhere <- rbind(dfhere, temp)
}
dfhere$mu <- dfhere$intercept + dfhere$x1 * dfhere$trait1
dfhere$y <- rnorm(n = nrow(dfhere), mean = dfhere$mu, sd = param[["sigma_y"]])

# Function for generating "good" initial values
simu_inits <- function(chain_id) {
    a_z.temp <- rnorm(n = nspecies, mean = param[["a_z"]], sd = 1)
    b_z.temp <- rnorm(n = nspecies, mean = param[["b_zf"]], sd = 1)
    return(append(list(a = a_z.temp, b_force = b_z.temp),
                  param))
}

The model estimates are on the first two rows of the below screenshot and the test data values in the last two rows.

I appreciate your help with this!

Deirdre

This indicates very high posterior correlation between some parameters. Have you looked at the pairs plots? How big are ESSs for the quantities of interest?

They don’t look that bad. Running just once is not enough to say those estimates would be bad (based on what you show).

No. I would code it as equation written in Wikipedia. No hurry testing that, if this already runs for you. It would be more important to figure out the reason for the high posterior correlation. Or before that you could try with dense mass matrix, that helps with correlated posteriors, too (unless the number of parameters is very large and the multiplication with the big dense mass matrix starts to dominate the computation time)

@avehtari thank you for your response and suggestions!

The pairs plot does suggest that the lambda and sigmas for the intercept_a and intercept_bf are correlated. There also appears to be modality in some of the parameters.

The ESS are also very small:

I will look into trying a dense mass matrix and hopefully that will make a difference. If you have any other suggestions based on the above output, I would be very interested in hearing them!

1 Like

@avehtari Thank you again for your previous insights into my model.

I have continued to work on this issue with my colleagues and ultimately we decided we only need to include the effects of the matrix on the model slope to answer our research question. This speeds up the model considerably and we generate no warning messages when run on our test data. The model also returns the true values used to generate the test data and the ESS look good compared to the previous models discussed above.

However, the pairs plot still shows a strong correlation between our lam_intercept and sigma_intercept.

I was curious is you had any insights into how concerning such a correlation is. If there is a high posterior correlation between these two parameters, can we still reliably interpret the model estimates?

For clarity, the new Stan code is the following:

functions {
  matrix lambda_vcv(matrix vcv, real lambda, real sigma){
    matrix[rows(vcv),cols(vcv)] local_vcv;
    local_vcv = vcv * lambda;
    for(i in 1:rows(local_vcv))
      local_vcv[i,i] = vcv[i,i];
    return(quad_form_diag(local_vcv, rep_vector(sigma, rows(vcv))));
  }
}

data {
  int<lower=1> N;
  int<lower=1> n_sp;
  int<lower=1, upper=n_sp> sp[N];
  vector[N] y; 		// response
  vector[N] x1; 	// predictor
  matrix[n_sp,n_sp] Vphy;     //matrix
  // Priors
  real mu_a_prior_mu;
  real mu_a_prior_sigma;
  real sigma_a_prior_mu;
  real sigma_a_prior_sigma;
  real b_zf_prior_mu;
  real b_zf_prior_sigma;
  real lam_interceptsbf_prior_alpha;
  real lam_interceptsbf_prior_beta;
  real sigma_interceptsbf_prior_mu;
  real sigma_interceptsbf_prior_sigma;
  real sigma_y_prior_mu;
  real sigma_y_prior_sigma;  

} 

parameters {
  real<lower=0> sigma_y;    
  real<lower=0, upper=1> lam_interceptsbf;       
  real<lower=0> sigma_interceptsbf;    
  vector[n_sp] b_force; 
  real b_zf;
  vector[n_sp] a; 
  real mu_a;
  real<lower = 0> sigma_a; 
	}

model {
  real yhat[N];
  matrix[n_sp,n_sp] vcv_b;     

  for(i in 1:N){
    yhat[i] = 
      a[sp[i]] + b_force[sp[i]] * x1[i];
  }
  vcv_b = cholesky_decompose(lambda_vcv(Vphy, lam_interceptsbf, sigma_interceptsbf));

  a ~ normal(mu_a, sigma_a);
  b_force ~ multi_normal_cholesky(rep_vector(b_zf, n_sp), vcv_b);

  y ~ normal(yhat, sigma_y);
  
  // Priors
  mu_a ~ normal(mu_a_prior_mu, mu_a_prior_sigma);
  sigma_a ~ normal(sigma_a_prior_mu, sigma_a_prior_sigma);
  b_zf ~ normal(b_zf_prior_mu, b_zf_prior_sigma);
  lam_interceptsbf ~ beta(lam_interceptsbf_prior_alpha, lam_interceptsbf_prior_beta);
  sigma_interceptsbf ~ normal(sigma_interceptsbf_prior_mu, sigma_interceptsbf_prior_sigma);
  sigma_y ~ normal(sigma_y_prior_mu, sigma_y_prior_sigma);

}

I appreciate your help with this!

Deirdre

edits by me for syntax highlighting. Note: To syntax highlight R or Stan needs to be lowercase after the backticks. Like ```stan

1 Like

These correlations are not high, and no reason to be worried, as the diagnostics also indicate the MCMC is working well.

1 Like