Gaussian process not recovering alpha parameter in cov_exp_quad

billWalker · September 15, 2017, 3:23pm

I’m trying to write a latent variable Gaussian process regression in which the GP varies over time, not over my exogenous variable values.

I’ve attached the code. I’m first simulating Ys, by taking the median of fake data generated in sim_data.R and using single_y_gp_sim.stan, then attempting to recover the parameters I set using single_y_gp.stan. I’m getting the length-scale parameter, rho, alright, but for some reason, no matter what I set alpha too, my alpha posterior distributions are too low.

What am I doing wrong? Is it taking the median of the simulated data?

sim_data.R (1.3 KB)
single_y_gp_sim.stan (944 Bytes)
single_y_gp.stan (1023 Bytes)

Thanks for your help!

bbbales2 · September 16, 2017, 12:53am

Could you write out the GP covariance function you’re trying to get at?

I had a look at this, but I had trouble figuring out the code. Might just be my brain being rusty though.

Is there an easier version where the x is one dimensional as well :D?

billWalker · September 17, 2017, 3:44pm

Hi,

Please find attached. The problem persists in the simpler version, so at least we have that possibility eliminated. I’m also not recovering alpha if I use a single simulated y from single_xy_gp_sim, instead of taking the median.

single_xy_gp_sim.stan (700 Bytes)
single_xy_gp.stan (793 Bytes)
single_xy_gp.R (791 Bytes)

bbbales2 · September 17, 2017, 4:22pm

Hmm, so is the goal to:

generate a time series (eta) from an AR(1) model
add a zero mean GP on top
then try to recover both the parameters of the zero mean GP as well as the eta time series?

Do I have that right?

So something like:

y ~ multi_normal(eta, K); // sigma added in to diagonal of K here

That integrates out all the latent variables of the GP. You could also do:

transformed parameters {
  vector[N] u;
  L = cholesky_decompose(K);
  u = L * z;
}
model {
  z ~ normal(0, 1);
  ... // Other priors
  y ~ normal(eta + u, sigma);
}

If you wanted to get the latent values of the GP. Is something like this what you were looking for?

It would probably be super hard to separate eta and u here if so.

If this isn’t what you’re looking for then I gotta get some more explanation on f = L_K * eta .* x; :D. That’s the thing I don’t think I’ve run across before.

billWalker · September 17, 2017, 4:37pm

I’m trying to make something like a Kalman Filter, in which betas from a linear regression on y vary smoothly over time. The eta I generated was meant to be a reasonable approximation of betas that vary smoothly. So:
y ~ multi_normal(eta .* x, K);
where x is an exogenous variable, and eta is the vector[N] equivalent of a constant beta in a linear regression.

Does that make sense?

bbbales2 · September 17, 2017, 4:39pm

I’m gonna think some more, but see if there is anything helpful in this thread: AR/Kalman-like models where the k-th estimate of a variable depends only on obs 1..k-1

bbbales2 · September 17, 2017, 4:45pm

The eta I generated was meant to be a reasonable approximation of betas that vary smoothly.

Hmm, okay, the words seem reasonable at least. (edit: that sounds sarcastic but what I meant was I understood what you said :P) The thing I don’t like about this is the GP sitting on top of everything.

Why not just y ~ normal(eta .* x, sigma)? Where eta is a smooth thing (we can put a zero mean GP prior on it)?

billWalker · September 17, 2017, 5:29pm

If I understand your formulation, that looks something like:

model {
  matrix[N,N] K = cov_exp_quad(time, alpha, rho);
...
  eta ~ multi_normal(0, K);
  y ~ normal(eta .* x, sigma);
}

I’ll try to simulate and recover with it written up that way.

I want to get to a point where y, and eta are matrixes and x is a three-dimensional array. So each y is associated with a matrix of x, and each column of x has the same eta column associated with it for all y columns. So, going this route, each column of eta is eta[,d] ~ multi_normal(0, K[d]), and each row of y is y ~ multi_normal(eta[n] * x[n], Sigma). That’s how I have the full model written at the moment, anyway. But I was trying to build up from simpler models before testing that one, and got stuck on alpha being consistently too low.

bbbales2 · September 17, 2017, 5:39pm

Yeah, seems reasonable. Give it a go and see what happens.

billWalker · September 17, 2017, 8:13pm

Ok, so that time I got it!

I wrote a function to generate the fake eta values within stan, instead of that AR(1) process I had before, and the values generated didn’t look at all like what I was getting in the first few tries. So the alpha I was trying to recover was unrealistic.

I attached the code I used. Thanks for the help! Hopefully I can scale up from here.

single_xy_gp_sim2.stan (1.0 KB)
single_xy_gp2.R (827 Bytes)
single_xy_gp2.stan (802 Bytes)

bbbales2 · September 17, 2017, 10:10pm

Cool beans. GPs can get a little funky to fit, so if things get weird, ask.

From the Stan prior recommendations page (https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations#priors-for-gaussian-processes): “In practice when we fit Gaussian processes we often either set the length-scale parameter to a fixed value corresponding to some desired level of smoothing, or we give it a strong prior. The identification issues are real”

So don’t get too caught up in the hyperparameters.

I always see @mike-lawrence and @Andre_Pfeuffer around here in threads about GPs and ARs and stuff. There have been some interesting threads around. If you’re bored you might search their names and see if you find anything interesting.

anon75146577 · September 17, 2017, 10:16pm

We have much better recommendations than that! Paper and case study coming, but long story short use an inverse gamma on the length scale where you’ve chosen the parameters so the left hand tail does not give very much mass below the median spacing of your observations [or some representative number]. This fixes all of your problems. Paper forthcoming (hopefully) with @rtrangucci @betanalpha and @avehtari (See also [1503.00256] Constructing Priors that Penalize the Complexity of Gaussian Random Fields)

betanalpha · September 17, 2017, 10:17pm

For the moment you can read about our current recommendations (that will be expanded in the paper and the case study) in the latest version of the manual.

bbbales2 · September 17, 2017, 10:29pm

Then the only recommendations on that page for GPs then that are still valid are these two

See also recommendations in Section 18.3 in Stan reference manual
For Matern fields, then the joint penalised complexity prior is available for the parameters (variance, range) parameters

Should we just delete the rest?

betanalpha · September 17, 2017, 10:48pm

Probably.

arya · September 18, 2017, 6:56pm

Interesting. I thought the Betancourt Trangucci workshop paper from last year was really useful. I’ll never NOT put a prior on my length scales now.

I was talking about priors on the length-scale (alpha) with a lab-mate which led me to an interesting perspective, I wanted to get the thoughts of @anon75146577 and @betanalpha. Basically for the squared-exponential kernel the eigen-functions are modified Hermite polynomials, so we can think about GP regression as linear regression with an infinite number of Hermite polynomial terms. Different values of the length-scale will lead to different bases of those modified Hermite terms, but no matter what the length-scale is, you’ll have a complete basis for representing all L2 functions.

From that perspective the length-scale will be un-identifiable because they’ll all lead to a basis that’s expressive enough to represent your L2 function. This is why I think putting good priors on the length-scale is so helpful in practice, because you’re able to put preference on which of these bases to use in a principled way.

Thoughts?

anon75146577 · September 18, 2017, 8:30pm

Mercer’s theorem says that every covariance function induces a complete basis for representing L2 functions, so this isn’t unique.

Also for a squared exponential, the length scale is identified!

betanalpha · September 18, 2017, 9:23pm

Where identification here is the technical term, meaning that with infinite data you get a likelihood that is non-zero at only a single point in parameter space.
For finite data, however, the hyperposterior of the squared exponential kernel manifests the same curved shape of a non-identified kernel like the Matern.

Topic		Replies	Views
Simple spatial guassian process model - unable to recover parameters Modeling	3	76	August 14, 2024
Identifiability of Simple Gaussian Process Modeling fitting-issues , gaussian-process , reparametrization	13	1873	February 4, 2022
Gaussian process predictive inference memory pressure Modeling	2	163	February 19, 2024
Identifiability of Gaussian process Modeling	4	1019	November 24, 2021
Recovering parameters of fake model Modeling	1	658	April 27, 2022

Gaussian process not recovering alpha parameter in cov_exp_quad

Related topics