Hi All,

**Context**

I have been strugling to fit a simple GP model and have some questions about

identifiability. In my actual use-case, I wish to model a time-varying regression

coefficient as a GP:

y_t \sim N(\mu_t, \sigma)

\mu_t = \beta_t \cdot X_t

\beta_t \sim GP(m,\ K(t,t'))

**Model**

In the simplified model here, there is no covariate X, i.e. we almost

observe \beta_t directly, with a small amount of noise:

y_t \sim N(\beta_t,\ \sigma)

Here is the true (simulated) \beta (black) and the observation y (blue):

The modelled GP is an exponentiated quadratic, and I am using a Hilbert space / basis function

approximation with 20 basis functions (the results do not seem sensitive to this choice for the ranges I have tried).

My priors for \alpha (`gp_alpha`

) and \rho (`gp_rho`

) are gamma distributed, such that 98% of the prior probability is between 0.005 and 0.8 for \rho, and between 0.05 and 0.8 for \alpha. Here are some realisations from the prior on \beta.

I understand that the posterior is degenerate for values of \rho outside the range of distances observed in the data.

**Issues**

- Every sample exceeds the default maximum treedepth.
- I needed very strong priors on both \alpha and \rho just to avoid divergences. For example, if my prior on \alpha is slightly relaxed to have 98% probability between 0.05 and 1.0, instead of 0.05 and 0.8, I get divergences.

Here is the pairs plot for `gp_alpha`

- `gp_rho`

:

And here is the pairs plot using the slightly wider prior on \alpha:

**Questions**

- Is there something obviously wrong with the way I have parameterised the model?
- Does the pairs plot suggest that there is a large range of \alpha-\rho combinations that are consistent with the data?
- Am I exceeding maximum tree-depth because the posterior is long and narrow over the weakly-identified \alpha-\rho region, and step size has been calibrated to resolve the narrow dimension, not the long dimension?
- Is there any function of \alpha and \rho that is known to be more strongly identified? I.e. a GP analogue to the sum of regression coefficients for highly conditionally correlated regressors?

Any advice is appreciated.

**Code**

```
# Generate data
set.seed(9)
N <- 520
t <- (1:N) / N
beta <- 0.5 + 0.25 * sin(2*pi * t)
y <- rnorm(N, mean=beta, sd=0.02)
data <- list(N = N, t = t, y = y)
```

```
# GP priors and parameters
# put 98% of probability mass between 0.005 and 0.8 for gp_rho
opt_rho <- gamma_solve(c(0.005, 0.8))$par
# put 98% of probability mass between 0.005 and 0.8 for gp_alpha
opt_alpha <- gamma_solve(c(0.05, 0.8))$par
data$gp_rho_prior <- opt_rho # c(1.3, 6.6)
data$gp_alpha_prior <- opt_alpha # c(3.3, 11.2)
data$M_f <- 20
data$c_f <- 1.5
```

Stan program

```
// gp_prior_1_beta.stan
functions {
vector diagSPD_EQ(real alpha, real rho, real L, int M) {
return sqrt((alpha^2) * sqrt(2*pi()) * rho * exp(-0.5*(rho*pi()/2/L)^2
* linspaced_vector(M, 1, M)^2));
}
matrix PHI(int M, real L, vector x) {
return sin(diag_post_multiply(rep_matrix(pi()/(2*L) * (x+L), M),
linspaced_vector(M, 1, M)))/sqrt(L);
}
}
data {
int<lower=1> N;
vector[N] y;
vector[N] t;
vector[2] gp_alpha_prior;
vector[2] gp_rho_prior;
real<lower=0> c_f; // factor c to determine the boundary value L
int<lower=1> M_f; // number of basis functions
}
transformed data {
real L_f = c_f * max(t);
// Basis functions for f
matrix[N, M_f] PHI_f = PHI(M_f, L_f, t);
}
parameters {
real<lower=0> sigma;
real<lower=0> gp_alpha;
real<lower=0> gp_rho;
real gp_mu;
vector[M_f] beta_f; // the basis functions coefficients
}
transformed parameters {
// spectral densities
vector[M_f] diagSPD_f = diagSPD_EQ(gp_alpha, gp_rho, L_f, M_f);
// f ~ gp_mu + GP(0, K)
vector[N] f = gp_mu + PHI_f * (diagSPD_f .* beta_f);
}
model {
// Mean model
y ~ normal(f, sigma);
// Priors
beta_f ~ normal(0, 1);
gp_rho ~ gamma(gp_rho_prior[1], gp_rho_prior[2]);
gp_alpha ~ gamma(gp_alpha_prior[1], gp_alpha_prior[2]);
gp_mu ~ normal(0.1, 0.3);
sigma ~ normal(0, 0.1);
}
```

R code to run model

```
s1b <- cmdstanr::cmdstan_model(stan_file = "gp_prior_1_beta.stan")
fit1b <- s1b$sample(data=data, iter_warmup = 500, iter_sampling=1000,
chains = 4, parallel_chains = 4, max_treedepth = 10,
adapt_delta = 0.95, seed=1)
```

R gamma dist helper

```
gamma_solve <- function(bounds, prob=c(0.01, 0.01), plotit=TRUE, init=NULL) {
lb <- bounds[1]
ub <- bounds[2]
f <- function(theta) {
a <- exp(theta[1]) # shape
b <- exp(theta[2]) # rate
sum((c(pgamma(lb, a, b), 1 - pgamma(ub, a, b)) - prob)^2)
}
if (is.null(init)) init <- c(log(4), log(8))
opt <- optim(par=init, fn=f)
opt$par <- exp(opt$par)
opt$prob <- pgamma(bounds, opt$par[1], opt$par[2])
if (plotit) {
curve(dgamma(x, opt$par[1], opt$par[2]), from=0, to=1, n=1000, lwd=2,
ylab="density", main=paste0(c("Shape: ", "Rate: "),
round(opt$par, 2), collapse = "\t"))
}
opt
}
```