Trace plot got stuck for some chains

Jiang_Du · December 19, 2019, 12:26am

I am fitting a multi-level Weibull regression model. The following convergence problem has shown up quite a lot. Basically, some chains get converged but some are not and they just got stuck. I’d like to know what is the cause of that and how to fix this. Right now I’m just changing random seed and hopefull all chains get converged.

An example of problematic traceplot:

Stan code:

data {
  int<lower=0> N;
  int<lower=0> Nevent;
  int<lower=0> K;
  int<lower=0> L; //levels
  vector[N] y;
  matrix[N, K] X;   // predictor matrix
  int<lower=1,upper=L> ll[N];//level index
}
transformed data {
  real<lower=0> tau_al;
  vector[Nevent] yobs;
  vector[N-Nevent] ycen;
  matrix[N, K] Q_ast;
  matrix[K, K] R_ast;
  matrix[K, K] R_ast_inverse;
  matrix[Nevent, K] Xobs;
  matrix[N-Nevent, K] Xcen;
  tau_al = 10.0;
  yobs = y[1:Nevent];
  ycen = y[(Nevent+1):N];
  // thin and scale the QR decomposition
  Q_ast = qr_Q(X)[, 1:K] * sqrt(N - 1);
  R_ast = qr_R(X)[1:K, ] / sqrt(N - 1);
  R_ast_inverse = inverse(R_ast);
}
parameters {
  real mu_alpha;
  real<lower=0> tau_alpha;
  vector[L] alpha_raw;
  // slope
  real mu_theta[K];
  real<lower=0> tau_theta[K];
  vector[K] theta[L];
  // intercept
  real mu_a;
  real<lower=0> tau_a;
  real a[L];
}
transformed parameters {
  vector[L] alpha;
  alpha = exp(mu_alpha + tau_alpha * alpha_raw);
}
model {
  vector[N] linear_part;
  vector[N] shape_part;
  mu_theta ~ normal(0, 10);
  tau_theta ~ student_t(4, 0, 1);
  mu_a ~ normal(0, 100);
  tau_a ~ student_t(4, 0, 1);
  mu_alpha ~ normal(0, 100);
  tau_alpha ~ student_t(4, 0, 1);
  // priors
  a ~ normal(mu_a, tau_a);
  for (l in 1:L){
    theta[l] ~ normal(mu_theta, tau_theta);
  }
  for (n in 1:N){
    shape_part[n] = alpha[ll[n]];
    linear_part[n] = (a[ll[n]] + Q_ast[n, 1:K] * theta[ll[n]])/alpha[ll[n]];
  }
  yobs ~ weibull(shape_part[1:Nevent], exp(linear_part[1:Nevent]));
  target += weibull_lccdf(ycen | shape_part[(Nevent+1):N], exp(linear_part[(Nevent+1):N]));
  alpha_raw ~ normal(0.0, 1.0);
}
generated quantities {
  vector[K] theta_out[L];
  for (l in 1:L){
    theta_out[l] = R_ast_inverse * theta[l]; // coefficients on x
  }
}

Initial methods:

def get_inits():
    inits = {
        'mu_alpha': 0.01*np.random.normal(),
        'tau_alpha': np.random.uniform(0.01, 0.05),
        'alpha_raw': 0.01*np.random.normal(size=L),
        'mu_theta': 0.1*np.random.normal(size=K),
        'tau_theta': np.random.uniform(0.05, 0.2, K),
        'theta': 0.1*np.random.normal(size=(L, K)),
        'mu_a': 0.1*np.random.normal(loc=3.5),
        'tau_a': np.random.uniform(0.2, 0.4),
        'a': 0.1*np.random.normal(size=L)
    }
    return inits

Calling methods:

n_chain = 2
Nsamples = 4000
nuts_control = {
    'adapt_delta' : 0.9,
    'max_treedepth': 12
}
fit = sm.sampling(data=stan_data, iter=Nsamples, chains=n_chain, init=get_inits, n_jobs=n_chain,
                  control=nuts_control, seed=1)

bbbales2 · December 19, 2019, 4:54pm

I’m just guessing but any chance:

alpha = exp(mu_alpha + tau_alpha * alpha_raw);

Are blowing up to infinity? Any time a model has explicit exps it’s pretty easy to get numerical problems which could cause a weirdness like this.

Jiang_Du · December 19, 2019, 5:16pm

I will try to make the priors more concentrate so to prevent infinity. On the other hand, I tried increase adapt_delta to 0.98 and it helps, but took me a really long time to get the results with warnings:

WARNING:pystan:35 of 4000 iterations ended with a divergence (0.875 %).
WARNING:pystan:Try running with adapt_delta larger than 0.98 to remove the divergences.
WARNING:pystan:101 of 4000 iterations saturated the maximum tree depth of 13 (2.52 %)
WARNING:pystan:Run again with max_treedepth larger than 13 to avoid saturation
WARNING:pystan:Chain 1: E-BFMI = 0.148
WARNING:pystan:Chain 2: E-BFMI = 0.174
WARNING:pystan:E-BFMI below 0.2 indicates you may need to reparameterize your model

The mixing is much better:

bbbales2 · December 19, 2019, 5:28pm

Not a bad idea. Even if you can specifically identify what’s blowing up, this is something to try. But check plots if you can.

Check for correlated variables. Identifiability problems are one way to get lots of treedepth things like this.

Yeah, it definitely looks better, but once you move away from having clearly stalled out chains it’s best to lean on Neff/Rhat and the other diagnostics to figure out what the chains are doing.

Topic		Replies	Views
Crazy looking trace plots, what's the matter? Modeling	20	3099	August 8, 2019
Hierarchical structure- Weird traceplot Modeling rstan , fitting-issues , performance	2	672	June 8, 2021
Low MC error per chain, but no convergence of chains Modeling	4	903	April 26, 2021
Estimation Results show: your data appears to have single value or no finite values General	1	40	November 4, 2024
Trace plot and mcmc diagnostics Modeling fitting-issues	4	2694	December 25, 2018

Trace plot got stuck for some chains

Related topics