Gaussian process regression

oh! sorry for flooding @avehtari ! I did miss your list of software! My bad…
thanks!

1 Like

Thank you. I have tried gplite but it is almost impossible to guess initial parameters for optimization for my problem with 14 variables. Do you have any suggestions of more robust packages to use?

Interesting. In my experience initialization problems are really rare (my experience mostly with gpstuff and gplite has quite similar implementation). Did you scale your data?

This code leads to infeasible starting point. Any suggestions how to get feasible starting point?

cf <- cf_sexp(normalize = T)
lik <- lik_bernoulli()
gp <- gp_init(cf, lik)
gp <- gp_optim(gp, X, y, trials = trials)

gplite offers a way of finding the posterior mode via gp_optim, but does not currently provide means for sampling or otherwise approximating the full posterior. How would you suggest to determine the importance of features? The procedure in the article Gaussian processes via sensitivity analysis of the posterior predictive distribution is not directly applicable. Besides gplite allows only one length scale.

You are correct. As I said before, based on the information you provided, no sampling is needed and sampling would be slow. Optimization should be fine in your case.

Why not? It’s using optimization (not sampling).

This doesn’t stop using the approach presented in the paper.

There’s also an improved method described in a paper I mentioned in this thread before Uncertainty-aware Sensitivity Analysis Using Rényi Divergences . That method also works with optimization, with one length-scale, and works better than ARD and SHAP, which are all reasons why I did recommend it for you in April.

Thanks a lot for the explanation. Suppose I would like to use KL measure of relevance. Would you please guide me how to calculate equation (2) for the binomial likelihood? In the paper there are only examples of univariate normal and Bernoulli likelihood. Sorry, my statistical background is bit shaky. The formula for the binomial case is in probability - Kullback-Leibler divergence for binomial distributions P and Q - Mathematics Stack Exchange but I still don’t understand how to calculate p and q. My guess is that p is determined by optimization. Right? But then how to calculate probability at perturbed point?

I realized that code for the paper wasn’t yet in the git repo mentioned in the paper. @topipa will add the code soon, and we’ll mention here when the code is available

First bits of code are now in the repo: github.com/topipa/rsens-paper

I will be adding more functionality in the near future. Feel free to raise issues for any bugs or missing features!

I was thinking about http://proceedings.mlr.press/v89/paananen19a/paananen19a-supp.pdf
Would you please explain how to calculate \pi_* and \pi_{*, \Delta_j} ?

Thanks a lot for the code. Is it possible to add binomial likelihood? Alternatively, can you advise how to modify rsens code in gpytorch folder to implement binomial likelihood?

I have not yet implemented the analytical expression for binomial likelihood. It is something I will consider in the future. In the mean time, you can use the finite difference alternative. You just need a way to compute the success probability after optimising the GP, and then you can compute the KL divergence between unperturbed and perturbed points.

I’m not very familiar with gplite, but as far as I understand it uses quadrature integration to compute the success probability. Here’s an example with binomial likelihood (adapted from gplite tutorial):

library(gplite)
sigmoid <- function(x) {
  1/(1 + exp(-x))
}

# Generate some toy data
set.seed(32004)
trials <- 10
d <- 3
n <- 250
sigma <- 0.1
x <- matrix(rnorm(n * d),n,d)
ycont <- 0.5*x[,1] + x[,2] + rnorm(n) * sigma
p <- sigmoid(ycont)
y <- rbinom(n,rep(trials,n),p)

# Fit the model using Laplace approximation
lik <- lik_binomial()
cf <- cf_sexp()
gp <- gp_init(cf, lik)
gp <- gp_optim(gp, x, y, trials = trials)

# compute KL sensitivity values
pert <- 0.001
pertm <- matrix(0,n,d)
kl <- matrix(0,n,d)
for (dd in seq(d)) {
  pertm[,dd] <- pert
  ps <- gp_pred(gp,x, transform = TRUE)
  ps_pert <- gp_pred(gp,x + pertm, transform = TRUE)
  kld <- log(ps$mean/ps_pert$mean)*trials*ps$mean + log((1 - ps$mean)/(1-ps_pert$mean)) * trials * (1-ps$mean)
  kl[,dd] <- sqrt(2*kld)/pert
  pertm[,dd] <- 0
}

# importance of the variables
colMeans(kl)

Thank you very much.

Can you please share the code in Python to calculate importance or features using KL or Renyl divergences?

Here is our repository for Python code (using GPy) for the finite difference method. The repository for the Rényi divergence method (using gpytorch) is found here. Are either of these useful for your use case?

1 Like

Is it possible to go one step further and determine uncertainty on feature importance?

Question for @avehtari. What about a more complicated model:
CL\sim MultivariateNormal(0,K_{CL} (x∣θ_{CL} ))
Vd\sim MultivariateNormal(0,K_{Vd} (x∣θ_{Vd}))
y\sim normal (f (CL,Vd), \sigma^2)
Is there a way to determine important features? Can I use the KL divergence approach? Here K_{CL} and K_{Vd} are Gaussian process kernels and f is a function of CL and Vd.

How many features?

In this case we have around 10 features. In addition, based on physics we know that CL is not being influenced by some features while Vd is not being influenced by some other features while some features such as weight influence both CL and Vd.

1 Like

Yes. With 10 features, it should be feasible.