Gaussian Process

Hello,
I used the below model using GP in brms. However, the model doesn’t move forward for hours. Is there any ways to speed up or is there anything wrong here as the model stays in warmup 0% for a long time. Thanks for your guidance.

 for (col in names(df)[8:57]) {
 formula <- paste0(col, " ~ gp(X)")
   
  model <- brm(formula,
               data = df,
               prior = c(prior("student_t(3, 0, 10)", class = "Intercept"),
                         prior("student_t(3, 0, 2.5)", class = "sdgp"),
                         prior("inv_gamma(1.494197, 0.056607)",class= "lscale"),
                         prior("student_t(3, 0, 2.5)",class="sigma")),               
                         seed = 1, chains = 6, iter = 10000,init=1,
                         control = list(adapt_delta = 0.99, max_treedepth = 13),
                         cores = future::availableCores())
  
  save(model, file = paste0(col, "_gp.RDATA"))
}

Could you post sample data that replicates the problem? On first glance, this looks like a very big multivariate model that might take a long time to fit no matter what you do. You might try brm(..., algorithm = "pathfinder") to use variational inference instead of the default HMC. This might help you catch issues with your priors. You can also try to speed up the HMC fit by generating initial values from the Pathfinder fit. There’s a worked example with cmdstanr, and there’s ongoing work to make this into a function. I suspect that brms will add support for this feature once it’s merged into cmdstanr.

@wpetry Thank you for your guidance. Thi sis a univariate analysis but many of them. I try to submit a data sample shortly. Thanks

Oh, my mistake. I misread your loop.

A few more ideas to consider if it’s not a data/prior issue:

  • You have a lot of iterations. The default (2000) is usually sufficient for HMC.
  • adapt_delta is quite high too. Did you increase this because of convergence issues?
2 Likes

This syntax uses covariance matrix presentation of GP, which can be very slow with more than couple hundred observations. With syntax gp(x, k=20, c=5/4) brms uses Hilbert basis function approximation which is much faster (see the paper for how to choose k and c. You can also try using splines with s(x) which by default uses basis function presentation of regularized tensor plate splines which is often a good choice, too.

2 Likes

@wpetry Yes, to have a better Rhat and convergence I increased iterations and wells adapt_delta as the warning suggested.

@avehtari Thank you for your guidance. The motivation to use the Gaussian process is that both outcome and predictors suffer from measurement error. I thought to run GP to obtain parameters from the model and perhaps model the parameters to decrease error.

model1<-outcome~gp(X,k=20, c=5/4)

worked.
I appreciate your help.

1 Like

Do you know anything about the measurement error in the predictors? brms has some nice functionality for incorporating this information into the model. See for example me, or the preferred mi. I haven’t used this functionality myself, but the examples that come along with the functions seem straightforward. You might be able to make some progress using this approach even if you do not know the error of the predictors exactly. For example, if you know something about the order of magnitude of the error you could do some sensitivity analyses to determine how important a range of plausible measurement errors is to the inference you hope to make.

Apologies if this comment is off base, it’s a little unclear what X is in your example and how that might relate to measurement error in your variables.

1 Like