Predictions using an integrated Laplace approximation -- API

charlesm93 · April 25, 2022, 3:16pm

I found an error in Algorithm 5.5 of my thesis which describes how to make predictive draws with the integrated Laplace approximation. This affects out-of-sample predictions, which I didn’t use in any applications (which might be why I didn’t catch this, though I should’ve built a better unit test).

Error and fix

When fitting Gaussian processes it is common to distinguish training data, X, and testing data, X^*, and to have latent Gaussian variables associated for each, respectively \theta and \theta^* – see Rasmussen & Williams (2006, Section 3.4.2). After marginalizing out \theta, we may want posterior draws for both \theta and \theta^*. The case for \theta is relatively straightforward.

For \theta^*, we need to examine the “extended” prior covariance matrix,

{\bf K} = \begin{bmatrix} K(X, X) & K(X, X^*)\\ K(X, X^*) & K(X^*, X^*) \end{bmatrix}

I’ll denote K = K(X, X), K^* = K(X, X^*) and K^{**} = K(X^*, X^*). We could abstract the problem a little, with

{\bf K} = \begin{bmatrix} K & K^*\\ K^* & K^{**} \end{bmatrix}

where the different components of K are arbitrarily specified by the user (i.e. they’re not the result of one fixed covariance function applied to different X's).

As written in the thesis, the mean of the approximating normal is

\mathbb E_\mathcal{G} (\theta^* \mid y, \phi, \eta, X, X^*) = K^* \nabla \log p(y \mid \hat \theta, \eta).

The error creeps into the calculation of the covariance, which is

\Sigma_\mathcal{G}(\theta^* \mid y, \phi, \eta, X, X^*) = K^{**} - K^* (K + W^{-1})^{-1} K^*.

I erroneously wrote the first term on the RHS as K^* and carried the mistake over in my code :( Fortunately this is an easy fix.

API

To draw posterior samples for \theta^*, we need to compute K, K^* and K^{**}. That’s three potentially different covariance functions. The current approach is to limit users to only one covariance function (...) ==> matrix, using variadic arguments but allows users to change the arguments that get passed when computing K and K^*. A call to the function may look as follows

  vector[n_obs] theta
    = laplace_marginal_rng(y, n_samples, ye, theta_0, 
                           K_functor,
                           forward_as_tuple(X, X), 
                           forward_as_tuple(X, X_pred), 
                           alpha, rho);

The idea is that K is computed using

K = K_functor(X_train, X_train, alpha, rho);

and K^* using

K_p = K_functor(X_train, X_pred, alpha, rho);

with the last two variadic arguments, alpha and rho used for both calls. Applying the fix to our algorithm, we also need to specify how to compute K^{**}. So we need a third set of arguments, and the call is

  vector[n_obs] theta
    = laplace_marginal_rng(... 
                           K_functor,
                           forward_as_tuple(X, X), 
                           forward_as_tuple(X, X_pred),
                           forward_as_tuple(X_pred, X_pred), 
                           alpha, rho);

Hmmm… ok. The only annoying thing about this is that if we want posterior draws for \theta, we need to pass forward_as_tuple(X, X) three times. What’s the fix here? Should we overload the function? Create two rng functions?

What if we need a more flexible solution?

Right now, the only way to distinguish K, K^*, and K^{**} is to pass different arguments to the same covariance function. This might not be flexible enough. We might create a function that allows the user to pass three distinct covariance functor, but then we lose variadic arguments.

The one solution that comes to mind is to treat \theta^* as regular latent Gaussian variables with empty groups. So the covariance functor returns the whole covariance matrix \bf K (and the likelihood is rewritten accordingly). That gives us all the flexibility we want, but it increases the dimension of the problem from n to n^* – with \theta \in \mathbb R^n and \theta^* \in \mathbb R^{n^*} – with the complexity of the approximation being \mathcal O(n^3).

Proposition

Have an API for the rng function which allows three different set of arguments for K_functor and overload the function for the case where we pass only one set of arguments (i.e. K^{**} = K^* =K).

@stevebronder @avehtari @Bob_Carpenter

avehtari · April 25, 2022, 3:28pm

Now I see the equations correctly

avehtari · April 25, 2022, 3:42pm

Why tuples, and not just

  vector[n_obs] theta
    = laplace_marginal_rng(y, n_samples, ye, theta_0, 
                           K_functor,
                           X,
                           X_pred, 
                           alpha, rho);

If X is data, I assume it’s passed by reference and not copied, so there is no issue of having X twice as the argument (when sampling at locations X_pred=X)

charlesm93 · April 25, 2022, 4:15pm

The tuples was Steve’s idea. Since we’re using variadic arguments, we need to distinguish which arguments are passed to K_functor when computing K, K^* and K^{**}. In the call you give, it’s unclear which set of arguments vary between different calls of K_functor (if we think more broadly about latent Gaussian models, X and X_pred may not be the only differences).

avehtari · April 25, 2022, 5:22pm

I don’t see how those tuples solve the problem. Maybe we should have a call?

charlesm93 · April 25, 2022, 6:31pm

Sure, it might be worth having a call with Steve. I can also add it as an agenda item for the next Stan meeting.

stevebronder · April 25, 2022, 10:46pm

Happy to discuss, at the time tuples felt like the only way to solve the issue of calling the functor with multiple sets of inputs. But if there are other schemes then im happy to impliment that

charlesm93 · April 29, 2022, 3:30pm

Some notes from yesterday’s Stan meeting, regarding which API we should use.

@WardBrian proposed passing arrays of arguments to laplace_rng, with the first element corresponding to the input for K, the second for K^* and the third for K^{**}. The signature would be

vector[n_obs] theta
    = laplace_marginal_rng(... 
                           K_functor,
                           {X, X, X_pred}, 
                           {X, X_pred, X_pred},
                           alpha, rho);

So when computing K, we use (X, X, alpha, rho), when computing K^*, (X, X_pred, alpha, rho), etc.

This framework is actually quite flexible. As @avehtari pointed out, the user may give K_functor an integer argument, say star, and write the function such that depending on the value of the argument, it returns K, K^* or K^{**}. So the call to the function would be

vector[n_obs] theta
    = laplace_marginal_rng(... 
                           K_functor,
                           {1, 2, 3},
                           X, X_pred,  alpha, rho);

This gives the user complete control and it avoids passing certain arguments multiple times. In summary, I believe we can implement Aki’s idea using Brian’s API.

I would also propose to distinguish predictive functions for in-sample and out-of-sample predictions, using two signatures:

laplace_rng();       // takes same args as laplace_marginal_lpdf
laplace_pred_rng();  // allows array of arguments to be passed

WardBrian · April 29, 2022, 5:21pm

Thanks @charlesm93 - I think you summarized my idea much better than I was able to during the meeting .

I think it’s quite flexible and I really like that it keeps positionality of arguments. If we did still want them to be tuples we could also statically check their sizes, but arrays would work in the mean time

charlesm93 · May 26, 2022, 1:51pm

btw, updated algorithm can be found here: https://charlesm93.github.io/files/thesis_erratum.pdf

charlesm93 · May 26, 2022, 2:07pm

I might need some guidance implementing the proposed API. I just cracked open the code and I’m not sure how to specify variadic array arguments, followed by regular array arguments. The C++ code can be found here:

github.com

stan-dev/math/blob/experimental/laplace/stan/math/laplace/prob/laplace_base_rng.hpp

#ifndef STAN_MATH_LAPLACE_PROB_LAPLACE_BASE_RNG_HPP
#define STAN_MATH_LAPLACE_PROB_LAPLACE_BASE_RNG_HPP

#include <stan/math/prim/prob/multi_normal_cholesky_rng.hpp>
#include <stan/math/prim/fun/cholesky_decompose.hpp>
#include <stan/math/laplace/laplace_marginal.hpp>

#include <Eigen/Sparse>
#include <Eigen/LU>

namespace stan {
namespace math {

/**
 * In a latent gaussian model,
 *
 *   theta ~ Normal(theta | 0, Sigma(phi, x))
 *   y ~ pi(y | theta, eta)
 *
 * returns a multivariate normal random variate sampled

This file has been truncated. show original

stevebronder · May 27, 2022, 6:36am

Sorry I’m confused, what is the API you are looking for? As in what would you like the user to call?

charlesm93 · May 27, 2022, 4:54pm

The API proposed a few weeks ago, from the post beginning with

Topic		Replies	Views
Gaussian Process Predictive Posterior Function Developers	7	1687	July 26, 2018
Integrated Laplace approximation roadmap Developers	49	5120	January 13, 2025
Integrated Laplace: signature specifies cov or precision Developers	4	468	April 30, 2022
Algebraic sovler problems (Laplace approximation in Stan) Modeling	65	6741	April 13, 2018
Post-Hoc Prediction: Use samples from fitted model in a prediction task Modeling techniques , specification	31	768	May 4, 2024

Predictions using an integrated Laplace approximation -- API

Error and fix

API

What if we need a more flexible solution?

Proposition

Related topics