Hey Andre and Ben,
@andre.pfeuffer
@bbbales2
I know this is fairly old, but I am still trying to get this model to work for my purposes. A quick reminder- I am trying to use the approximated GP to model error due to temporal autocorrelation in a model describing the relationship between winds and ocean currents.
I’ve (finally) realized that I am having a problem with the “Distance” vector, x, in the approx_L function. I have been giving the model a single distance vector centered at zero to use for the fit. That vector represents the distance in time between sampling points, but following earlier advice it is demeaned (so that the mean is zero). What is happening is that the error term is really only valid for the middle of the data set, because that is where distance is zero due to the demeaning procedure. Thus, the distance to the other data points is based on the exactly middle data point’s location in time.
The posterior distribution of the parameters (there are M of them), ‘g’, that describes the variable intercept due to autocorrelation error ends up looking a lot like zero. But if you pull the GP out and do some inference there, you see that it is near zero at the extremes and has non-zero values right around the data points nearest the center of the dataset, i.e., where the distance vector is accurately describing distance to other points. So the GP approximated error seems to work, but only where the distance vector accurately represents the relationship of the current data point to all of the other data points. Here is a plot showing the mean posterior value of ‘gp’ where
gp = approx_L(M, scale, Distance, inv_rho_sq) * g;
What it seems like I need to do is update the distance vector that I feed to approx_L(), which is called xt in Ben’s original function, such that it is zero for the present data point, rather than setting the middle of the data set as the zero position. Does that make any sense? So I think I need to adapt approx_L() to deal with a different distance vector depending on the data point for which I’m estimating an error term.
Does any of this seem to make sense? Any ideas on how to make that work in the stan code?
I appreciate your thoughts!
Here’s my model:
functions {
matrix approx_L(int M, real scale, vector x, real l) {
int N = rows(x);
real epsilon = 1 / (sqrt2() * l);
real alpha = 1 / scale;
real beta = (1.0 + (2.0 * epsilon / alpha)^2)^0.25;
real delta = alpha* sqrt((beta^2 - 1.0) / 2.0);
vector[N] Ht[M];
matrix[N, M] Htt;
vector[N] xp = alpha * beta * x;
real fsq = epsilon^2 / (alpha^2 + delta^2 + epsilon^2);
real f = sqrt(fsq);
Ht[1] = sqrt(sqrt(alpha^2 / (alpha^2 + delta^2 + epsilon^2))) * sqrt(beta) * exp(-delta^2 * x .* x);
Ht[2] = f * sqrt(2.0) * xp .* Ht[1];
for(n in 3:M) {
Ht[n] = f * sqrt(2.0 / (n - 1.0)) * xp .* Ht[n - 1] - fsq * sqrt((n - 2.0) / (n - 1.0)) * Ht[n - 2];
}
for(n in 1:M) {
Htt[:, n] = Ht[n];
}
return Htt;
}
}
data {
// model parameters
int<lower=1> N; // number of (time) points in series
int<lower=0> K; // number of predictos
matrix[N, K] X; // predictor design matrix
vector[N] response; // vector of Alongshore current speeds
// GP apprximation parameters (modeling error as approx GP)
real scale; // set to 0.2
vector[N] Distance;
int<lower=1> M; // number of eigenvectors of var-covar matrix to retain to approximate the GP
//prediction
int<lower=1> S; // length of wind series to simulate over
int<lower=0> sK; // number of predictors for simulation
matrix[S, sK] SIM; // matrix of ismulated values
}
parameters {
vector[M] g; // g is the variable intercept offset
vector[K] B; // coefficients of the model (both intercepts and slopes)
real<lower=0> sigma_response;
// real<lower=0> eta_sq;
real<lower=0> inv_rho_sq;
// real<lower=0> sigma_sq;
}
transformed parameters {
real<lower=0> rho_sq = inv(inv_rho_sq);
vector[N] gp; // gaussian process approximation; represents temporal autocorrelation
gp = approx_L(M, scale, Distance, inv_rho_sq) * g;
}
model{
vector[N] mu;
//priors
// eta_sq ~ cauchy(0,1);
inv_rho_sq ~ cauchy(0,1);
// sigma_sq ~ cauchy(0,1);
B[1] ~ normal(0,5); // intercept prior
B[2:K] ~ normal(0,1); // slope priors
g ~ normal(0, 1);
sigma_response ~ normal(0, 5); // overall model sigma; 5 is probably too big for the variance hyperparameter
//model
mu = X * B + gp;
response ~ normal(mu, sigma_response);
}
generated quantities{
vector[S] sim_cur; //simulated response output
vector[N] mu_pred;
vector[N] pred_cur;
vector[N] log_lik;
vector[N] resids;
sim_cur = SIM * B; // simulated response across the range of params
mu_pred = X*B + gp; //+ approx_L(M, scale, Distance, inv_rho_sq) * g;
for(i in 1:N){
pred_cur[i] = normal_rng(mu_pred[i], sigma_response); // posterior predictions of the model
log_lik[i] = normal_lpdf(response[i] | mu_pred[i], sigma_response); // compute log likelihood for us in AIC
}
resids = response - mu_pred;
}