Calculation of the Gaussian process posterior mean inside a model


#1

The much simplified GP model is:

data {
  int n;
  vector[n] y;
  matrix[n,n] C;
}

parameters {
  real sigma2_e;
}
model {
  matrix[n,n] K;
  vector[n] postmean;
  
  K = C + diag_matrix(rep_vector(sigma2_e,n));
  
  postmean = C*inverse(K)*y;
  
  y ~ multi_normal(rep_vector(0, n), K);
}

My question is if the way I calculate “postmean” is correct. In models where I calculate the “postmean” variable as above for later modeling (not shown), it does not seem to work (get log probability errors). I am familiar with obtaining the posterior mean as giving in the Stan book under the heading “Latent variable GP” where the model is:

model {
vector[N] f;
{
matrix[N, N] L_K;
matrix[N, N] K = cov_exp_quad(x, alpha, rho);
// diagonal elements
for (n in 1:N)
K[n, n] = K[n, n] + delta;

L_K = cholesky_decompose(K);
f = L_K * eta;
}
rho ~ inv_gamma(5, 5);
alpha ~ normal(0, 1);
sigma ~ normal(0, 1);
eta ~ normal(0, 1);
y ~ normal(f, sigma);
}

with f being the posterior mean. I dislike this way of obtaining the posterior mean because I see no reason to sample for it if it’s possible to obtain it exactly. The latent approach does not give me the log probability errors I get using the exact approach. Hence, that is why I believe my way of calculating the exact posterior mean in Stan is incorrect.


#2

You are right that you don’t need a latent function to compute the likelihood of a gaussian process with normally-distributed observations; in that case it is the likelihood is given simply by y \sim \mathcal{N}(\mathbf{\mu}, K) (a multivariate normal), which is what you have there.
However, your computation of postmean is doing nothing to the likelihood, and you are only estimating sigma_2e, so maybe your value of K is really off and you are just assuming its value and passing it as fixed data to Stan.

Page 138 of the user guide 2.18.1 has an example that does not use a latent function that you can probably base your version on:

data {
int<lower=1> N;
real x[N];
}
transformed data {
matrix[N, N] K = cov_exp_quad(x, 1.0, 1.0);
vector[N] mu = rep_vector(0, N);
for (n in 1:N)
K[n, n] = K[n, n] + 0.1;
}
parameters {
vector[N] y;
}
model {
y ~ multi_normal(mu, K);
}

#3

I know that the calculation of postmean is doing nothing to the likelihood in my toy example. My question was if that is how you would calculate the posterior mean in Stan. I am not interested in the calculation of the likelihood. I know that K is fixed in my example. The reason it is fixed is that the values in K would not impact how to code the calculation of postmean. I tried to make the example as simple as possible so that there would be no confusion in answering the question I was interested in due to complications of the model.

Edit: C is fixed, K is not.


#4

Alright, I thought the main issue was getting some error associated to computing the likelihood, and whether the calculation of your “postmean” variable couldbe causing it; since it is not doing anything there it cannot be what is causing the error, so you have two completely separate issues there.

The only thing I see that can be causing an error is what I already mentioned in the previous reply, my suggestion is to first remove the “postmean” variable altogether and see if the error remains. My guess is it will, so try the user guide example and go from there.
Additionally, Stan has the generated quantities blcok specifically for additional values that are not part of the model but you may want to keep.

About the quantity you want to compute itself. It’s not clear to me what that quantity is, but it looks and sounds like it’s the mean of the mean unobserved values of the gaussian process given your observations. I don’t quite understand it, because I don’t understand what C K^{-1} \mathbf{y} is something close to the identity matrix but with noise added to one of them, and then multiplying y, can you elaborate on what is it you are trying to estimate and compute from the gaussian process noise term you are estimating?


#5

See Eq. 2.35 in the GPML book (link) for the value being computed.


#6

You are just multiplying two matrices and a vector, I see no reason why that could raise log probability errors, but I also think that Stan will not save the values you compute in the model block, so you should either move them to the transformed parameters or generated quantities block, I’m not sure since I haven’t used them much lately. In principle that should not affect any runtime errors, though.

In any case I suggest you try the possible fixes I suggested, and the Stan user guide example, tosee if any of them works, if you keep getting errors it would be helpful to see the actual error message to see what Stan complains about.