Predictions using an integrated Laplace approximation -- API

I found an error in Algorithm 5.5 of my thesis which describes how to make predictive draws with the integrated Laplace approximation. This affects out-of-sample predictions, which I didn’t use in any applications (which might be why I didn’t catch this, though I should’ve built a better unit test).

Error and fix

When fitting Gaussian processes it is common to distinguish training data, X, and testing data, X^*, and to have latent Gaussian variables associated for each, respectively \theta and \theta^* – see Rasmussen & Williams (2006, Section 3.4.2). After marginalizing out \theta, we may want posterior draws for both \theta and \theta^*. The case for \theta is relatively straightforward.

For \theta^*, we need to examine the “extended” prior covariance matrix,

{\bf K} = \begin{bmatrix} K(X, X) & K(X, X^*)\\ K(X, X^*) & K(X^*, X^*) \end{bmatrix}

I’ll denote K = K(X, X), K^* = K(X, X^*) and K^{**} = K(X^*, X^*). We could abstract the problem a little, with

{\bf K} = \begin{bmatrix} K & K^*\\ K^* & K^{**} \end{bmatrix}

where the different components of K are arbitrarily specified by the user (i.e. they’re not the result of one fixed covariance function applied to different X's).

As written in the thesis, the mean of the approximating normal is

\mathbb E_\mathcal{G} (\theta^* \mid y, \phi, \eta, X, X^*) = K^* \nabla \log p(y \mid \hat \theta, \eta).

The error creeps into the calculation of the covariance, which is

\Sigma_\mathcal{G}(\theta^* \mid y, \phi, \eta, X, X^*) = K^{**} - K^* (K + W^{-1})^{-1} K^*.

I erroneously wrote the first term on the RHS as K^* and carried the mistake over in my code :( Fortunately this is an easy fix.

API

To draw posterior samples for \theta^*, we need to compute K, K^* and K^{**}. That’s three potentially different covariance functions. The current approach is to limit users to only one covariance function (...) ==> matrix, using variadic arguments but allows users to change the arguments that get passed when computing K and K^*. A call to the function may look as follows

  vector[n_obs] theta
    = laplace_marginal_rng(y, n_samples, ye, theta_0, 
                           K_functor,
                           forward_as_tuple(X, X), 
                           forward_as_tuple(X, X_pred), 
                           alpha, rho);

The idea is that K is computed using

K = K_functor(X_train, X_train, alpha, rho);

and K^* using

K_p = K_functor(X_train, X_pred, alpha, rho);

with the last two variadic arguments, alpha and rho used for both calls. Applying the fix to our algorithm, we also need to specify how to compute K^{**}. So we need a third set of arguments, and the call is

  vector[n_obs] theta
    = laplace_marginal_rng(... 
                           K_functor,
                           forward_as_tuple(X, X), 
                           forward_as_tuple(X, X_pred),
                           forward_as_tuple(X_pred, X_pred), 
                           alpha, rho);

Hmmm… ok. The only annoying thing about this is that if we want posterior draws for \theta, we need to pass forward_as_tuple(X, X) three times. What’s the fix here? Should we overload the function? Create two rng functions?

What if we need a more flexible solution?

Right now, the only way to distinguish K, K^*, and K^{**} is to pass different arguments to the same covariance function. This might not be flexible enough. We might create a function that allows the user to pass three distinct covariance functor, but then we lose variadic arguments.

The one solution that comes to mind is to treat \theta^* as regular latent Gaussian variables with empty groups. So the covariance functor returns the whole covariance matrix \bf K (and the likelihood is rewritten accordingly). That gives us all the flexibility we want, but it increases the dimension of the problem from n to n^* – with \theta \in \mathbb R^n and \theta^* \in \mathbb R^{n^*} – with the complexity of the approximation being \mathcal O(n^3).

Proposition

Have an API for the rng function which allows three different set of arguments for K_functor and overload the function for the case where we pass only one set of arguments (i.e. K^{**} = K^* =K).

@stevebronder @avehtari @Bob_Carpenter

1 Like

Now I see the equations correctly

Why tuples, and not just

  vector[n_obs] theta
    = laplace_marginal_rng(y, n_samples, ye, theta_0, 
                           K_functor,
                           X,
                           X_pred, 
                           alpha, rho);

If X is data, I assume it’s passed by reference and not copied, so there is no issue of having X twice as the argument (when sampling at locations X_pred=X)

The tuples was Steve’s idea. Since we’re using variadic arguments, we need to distinguish which arguments are passed to K_functor when computing K, K^* and K^{**}. In the call you give, it’s unclear which set of arguments vary between different calls of K_functor (if we think more broadly about latent Gaussian models, X and X_pred may not be the only differences).

2 Likes

I don’t see how those tuples solve the problem. Maybe we should have a call?

2 Likes

Sure, it might be worth having a call with Steve. I can also add it as an agenda item for the next Stan meeting.

2 Likes

Happy to discuss, at the time tuples felt like the only way to solve the issue of calling the functor with multiple sets of inputs. But if there are other schemes then im happy to impliment that

1 Like

Some notes from yesterday’s Stan meeting, regarding which API we should use.

@WardBrian proposed passing arrays of arguments to laplace_rng, with the first element corresponding to the input for K, the second for K^* and the third for K^{**}. The signature would be

vector[n_obs] theta
    = laplace_marginal_rng(... 
                           K_functor,
                           {X, X, X_pred}, 
                           {X, X_pred, X_pred},
                           alpha, rho);

So when computing K, we use (X, X, alpha, rho), when computing K^*, (X, X_pred, alpha, rho), etc.

This framework is actually quite flexible. As @avehtari pointed out, the user may give K_functor an integer argument, say star, and write the function such that depending on the value of the argument, it returns K, K^* or K^{**}. So the call to the function would be

vector[n_obs] theta
    = laplace_marginal_rng(... 
                           K_functor,
                           {1, 2, 3},
                           X, X_pred,  alpha, rho);

This gives the user complete control and it avoids passing certain arguments multiple times. In summary, I believe we can implement Aki’s idea using Brian’s API.

I would also propose to distinguish predictive functions for in-sample and out-of-sample predictions, using two signatures:

laplace_rng();       // takes same args as laplace_marginal_lpdf
laplace_pred_rng();  // allows array of arguments to be passed
3 Likes

Thanks @charlesm93 - I think you summarized my idea much better than I was able to during the meeting .

I think it’s quite flexible and I really like that it keeps positionality of arguments. If we did still want them to be tuples we could also statically check their sizes, but arrays would work in the mean time

1 Like

btw, updated algorithm can be found here: https://charlesm93.github.io/files/thesis_erratum.pdf

I might need some guidance implementing the proposed API. I just cracked open the code and I’m not sure how to specify variadic array arguments, followed by regular array arguments. The C++ code can be found here:

Sorry I’m confused, what is the API you are looking for? As in what would you like the user to call?