Help reparameterize GP model to remove divergent transitions

realkrantz · February 18, 2022, 12:29pm

I have 79 divergent transitions after warmup (of a total of 20000 post-warmup samples). Could anyone help re-parameterize the model to remove these divergent transitions? Thanks in advance.

data{
  int N1;
  real y[N1];
  real X1[N1];  
  int N2;
  real X2[N2];
}

transformed data{
  real delta = 1e-9;
  int<lower=1> N = N1 + N2;
  real x[N];
  for (n1 in 1:N1) x[n1] = X1[n1];
  for (n2 in 1:N2) x[N1 + n2] = X2[n2];
 }

parameters {
  real<lower=0> alpha;
  real<lower=0> sigma;
  real a;
  vector[N] eta;
  real<lower=0> rho;
}

transformed parameters {
  vector[N] f;
  {
    matrix[N, N] L_K;
    matrix[N, N] K = cov_exp_quad(x, alpha, rho);
    for (n in 1:N)
      K[n, n] = K[n, n] + delta;
    L_K = cholesky_decompose(K);
    f = L_K * eta;
  }
}

model{
  for(i in 1:N1)
  y[i] ~ normal(a + f[i], sigma);
  alpha ~ std_normal();
  sigma ~ std_normal();
  a ~ std_normal();
  eta ~ std_normal();
  rho ~ inv_gamma(5, 5);
}

mike-lawrence · February 18, 2022, 3:27pm

Can you discern where in the parameter space the divergent transitions arise?

realkrantz · February 18, 2022, 3:30pm

Hi @mike-lawrenc, thanks. How could I do that? That would be great.

mike-lawrence · February 18, 2022, 3:33pm

See the Bayesplot package, particularly the mcmc_scatter() function:

# scatter plot also showing divergences
color_scheme_set("darkgray")
mcmc_scatter(
  as.matrix(fit2),
  pars = c("tau", "theta[1]"),
  np = nuts_params(fit2),
  np_style = scatter_style_np(div_color = "green", div_alpha = 0.8)
)

You probably don’t want f in the list of parameters to visualize though, that’ll blow up the number of plot panels.

andrjohns · February 18, 2022, 3:39pm

While not completely on-topic, you can optimise your model specification by treating the likelihood as a linear regression and using the normal_id_glm distribution. This will be much more efficient in sampling, and can be GPU-accelerated:

data{
  int N1;
  real y[N1];
  real X1[N1];  
  int N2;
  real X2[N2];
}

transformed data{
  real delta = 1e-9;
  int<lower=1> N = N1 + N2;
  real x[N];
  for (n1 in 1:N1) x[n1] = X1[n1];
  for (n2 in 1:N2) x[N1 + n2] = X2[n2];
 }

parameters {
  real<lower=0> alpha;
  real<lower=0> sigma;
  real a;
  vector[N] eta;
  real<lower=0> rho;
}

model{
  matrix[N, N] L_K = cholesky_decompose(add_diag(cov_exp_quad(x, alpha, rho), delta));
  y ~ normal_id_glm(L_K, a, eta, sigma);

  alpha ~ std_normal();
  sigma ~ std_normal();
  a ~ std_normal();
  eta ~ std_normal();
  rho ~ inv_gamma(5, 5);
}

realkrantz · February 18, 2022, 3:44pm

Hi @mike-lawrence, working on that. I will share the plots soon. Thanks for the input.

mike-lawrence · February 18, 2022, 3:54pm

Neat, I hadn’t seen that use of normal_id_glm in a GP like that before; any reason you declare the L_K variable vs do the computation in-place as an argument? As I understand it, declared variables have a performance cost relative computing in-place.

mike-lawrence · February 18, 2022, 3:55pm

@realkrantz btw I slightly suspect that you might want to scale the jitter by the amplitude parameter; at least I have a vague memory that doing that helps GP sampling.

andrjohns · February 18, 2022, 3:56pm

That was more just for clarity in presentation, in-place would definitely be better

realkrantz · February 18, 2022, 4:07pm

Hi @andrjohns. That’s cool. Could we move matrix[N, N] L_K back to transformed parameters block? Would that reduce performance?

mike-lawrence · February 18, 2022, 4:11pm

No, the only difference between variables declared in the model and TP blocks is that the latter get saved in the output and are also checked for any bounds.

realkrantz · February 18, 2022, 4:23pm

Hi @mike-lawrence, @andrjohns, et al. Here is the ouput of

mcmc_scatter(
     as.matrix(fit),
     pars = c("alpha", "sigma"),
     np = nuts_params(fit),
     np_style = scatter_style_np(div_color = "green", div_alpha = 0.8) )

mike-lawrence · February 18, 2022, 4:25pm

Just to be thorough, can do pars = c("alpha","sigma","a","rho")?

Oh, I think you’d need to use bayesplot::mcmc_pairs() for more than two variables then.

realkrantz · February 18, 2022, 4:41pm

@mike-lawrence, working on that. The machines are currently a bit slow because of the GP’s. But will get the plot shared soon.

@andrjohns, thanks for this neat GP implementation. I am currently testing it with my actual data. But how to do predictions? I could not find rng for normal_id_glm in the manuals.

andrjohns · February 18, 2022, 4:44pm

You can use:

normal_rng(a +  L_K*eta, sigma)

realkrantz · February 18, 2022, 5:00pm

Brilliant, @andrjohns. Thanks. I will test it and share something.

@mike-lawrence, here is the output of mcmc_pairs. Any thoughts?

mike-lawrence · February 18, 2022, 5:04pm

Can you add np = nuts_params(fit) So we can see where the divergences are happening?

realkrantz · February 18, 2022, 5:14pm

Sure. Soon. Thanks.

realkrantz · February 18, 2022, 5:23pm

@mike-lawrence, @andrjohns, here it’s with np = nuts_params(fit). Any thoughts?

mike-lawrence · February 18, 2022, 5:39pm

OK, that really helps. It seems that the issue is at very small values for sigma. Since that parameter reflects A measurements noise quantity, I think it would be sensible to use a zero avoiding prior. Try sigma ~ weibull(2,1) ;

Topic		Replies	Views
Divergent transitions after warmup Modeling rstan , fitting-issues , performance	1	601	February 21, 2024
Persistent divergent transitions in simple model Modeling fitting-issues	5	541	April 17, 2020
How to Reduce Divergent Transitions for Non-linear Hierarchical Model? Modeling rstan , techniques , fitting-issues , divergences	4	974	October 22, 2020
Reparameterizing and Convergence with rstanarm rstanarm rstan	12	1242	May 2, 2018
Divergent transitions after warmup in hierarchical mixture of agents model Modeling rstan , fitting-issues	2	66	April 14, 2025

Help reparameterize GP model to remove divergent transitions

Related topics