Help reparameterize GP model to remove divergent transitions

I have 79 divergent transitions after warmup (of a total of 20000 post-warmup samples). Could anyone help re-parameterize the model to remove these divergent transitions? Thanks in advance.

data{
  int N1;
  real y[N1];
  real X1[N1];  
  int N2;
  real X2[N2];
}

transformed data{
  real delta = 1e-9;
  int<lower=1> N = N1 + N2;
  real x[N];
  for (n1 in 1:N1) x[n1] = X1[n1];
  for (n2 in 1:N2) x[N1 + n2] = X2[n2];
 }

parameters {
  real<lower=0> alpha;
  real<lower=0> sigma;
  real a;
  vector[N] eta;
  real<lower=0> rho;
}

transformed parameters {
  vector[N] f;
  {
    matrix[N, N] L_K;
    matrix[N, N] K = cov_exp_quad(x, alpha, rho);
    for (n in 1:N)
      K[n, n] = K[n, n] + delta;
    L_K = cholesky_decompose(K);
    f = L_K * eta;
  }
}

model{
  for(i in 1:N1)
  y[i] ~ normal(a + f[i], sigma);
  alpha ~ std_normal();
  sigma ~ std_normal();
  a ~ std_normal();
  eta ~ std_normal();
  rho ~ inv_gamma(5, 5);
}

Can you discern where in the parameter space the divergent transitions arise?

Hi @mike-lawrenc, thanks. How could I do that? That would be great.

See the Bayesplot package, particularly the mcmc_scatter() function:

# scatter plot also showing divergences
color_scheme_set("darkgray")
mcmc_scatter(
  as.matrix(fit2),
  pars = c("tau", "theta[1]"),
  np = nuts_params(fit2),
  np_style = scatter_style_np(div_color = "green", div_alpha = 0.8)
)

You probably don’t want f in the list of parameters to visualize though, that’ll blow up the number of plot panels.

1 Like

While not completely on-topic, you can optimise your model specification by treating the likelihood as a linear regression and using the normal_id_glm distribution. This will be much more efficient in sampling, and can be GPU-accelerated:

data{
  int N1;
  real y[N1];
  real X1[N1];  
  int N2;
  real X2[N2];
}

transformed data{
  real delta = 1e-9;
  int<lower=1> N = N1 + N2;
  real x[N];
  for (n1 in 1:N1) x[n1] = X1[n1];
  for (n2 in 1:N2) x[N1 + n2] = X2[n2];
 }

parameters {
  real<lower=0> alpha;
  real<lower=0> sigma;
  real a;
  vector[N] eta;
  real<lower=0> rho;
}

model{
  matrix[N, N] L_K = cholesky_decompose(add_diag(cov_exp_quad(x, alpha, rho), delta));
  y ~ normal_id_glm(L_K, a, eta, sigma);

  alpha ~ std_normal();
  sigma ~ std_normal();
  a ~ std_normal();
  eta ~ std_normal();
  rho ~ inv_gamma(5, 5);
}
2 Likes

Hi @mike-lawrence, working on that. I will share the plots soon. Thanks for the input.

Neat, I hadn’t seen that use of normal_id_glm in a GP like that before; any reason you declare the L_K variable vs do the computation in-place as an argument? As I understand it, declared variables have a performance cost relative computing in-place.

@realkrantz btw I slightly suspect that you might want to scale the jitter by the amplitude parameter; at least I have a vague memory that doing that helps GP sampling.

That was more just for clarity in presentation, in-place would definitely be better

1 Like

Hi @andrjohns. That’s cool. Could we move matrix[N, N] L_K back to transformed parameters block? Would that reduce performance?

No, the only difference between variables declared in the model and TP blocks is that the latter get saved in the output and are also checked for any bounds.

1 Like

Hi @mike-lawrence, @andrjohns, et al. Here is the ouput of

mcmc_scatter(
     as.matrix(fit),
     pars = c("alpha", "sigma"),
     np = nuts_params(fit),
     np_style = scatter_style_np(div_color = "green", div_alpha = 0.8) )

Just to be thorough, can do pars = c("alpha","sigma","a","rho")?

Oh, I think you’d need to use bayesplot::mcmc_pairs() for more than two variables then.

@mike-lawrence, working on that. The machines are currently a bit slow because of the GP’s. But will get the plot shared soon.

@andrjohns, thanks for this neat GP implementation. I am currently testing it with my actual data. But how to do predictions? I could not find rng for normal_id_glm in the manuals.

You can use:

normal_rng(a +  L_K*eta, sigma)
1 Like

Brilliant, @andrjohns. Thanks. I will test it and share something.

@mike-lawrence, here is the output of mcmc_pairs. Any thoughts?

Can you add np = nuts_params(fit) So we can see where the divergences are happening?

Sure. Soon. Thanks.

@mike-lawrence, @andrjohns, here it’s with np = nuts_params(fit). Any thoughts?

OK, that really helps. It seems that the issue is at very small values for sigma. Since that parameter reflects A measurements noise quantity, I think it would be sensible to use a zero avoiding prior. Try sigma ~ weibull(2,1) ;

1 Like