Mixture between non-exchangeable and exchange strata parameters from Neuenschwander et al. (2015)

Marcio · October 8, 2021, 9:59pm

Hi there. I am currently implementing two mixture models discussed in Neuenschwander et al. (2015) paper, but for a continuous endpoint: Y_{ij} defined as the response variable for subject i from strata j for i = 1, \ldots, N_j, j = 1, \ldots, 4 such that N = \sum_{j = 1}^4N_j. The models are the following:

(i) Exchangeable strata parameters
Y_{ij} \sim N(\theta_j, \sigma_j)
\theta_j \sim N(\mu, \tau) for j = 1, \ldots, J
\sigma_j \sim Half-Cauchy(0, 2.5) for j = 1, \ldots, J
\mu \sim N(0, 10)
\tau \sim Half-Cauchy(0, 2.5)

using the following STAN code:

data {
  int<lower=0> N;
  int<lower=1> J;
  int<lower=1,upper=J> id[N]; 
  vector[N] y;
}

parameters {
  real mu;
  real<lower=0> tau;
  vector[J] theta;
  vector<lower=0>[J] sigma;
}

model {
  mu ~ normal(0, 10);
  tau ~ cauchy(0, 2.5);
  
  for (j in 1:J){
    sigma[j] ~ cauchy(0, 2.5);
    theta[j] ~ normal(mu, tau);
  }

  for (n in 1:N){
    y[n] ~ normal(theta[id[n]], sigma[id[n]]);
  }
  
}

(ii) Mixture between Non-Exchangeable and Exchangeable strata parameters:
Y_{ij} \sim N(\theta_j, \sigma_j)
\theta_j \sim \delta_j N(\mu_0, \tau_0) + (1 - \delta_j) N(\mu_j, \tau_j) for j = 1, \ldots, J
\sigma_j \sim Half-Cauchy(0, 2.5) for j = 1, \ldots, J
\mu_j \sim N(0, 10) for j = 0, \ldots, J
\tau_j \sim Half-Cauchy(0, 2.5) for j = 0, \ldots, J
\delta_j \sim Bernouli(p_j)
p_j \sim Beta(1, 1)

using the following STAN Code:

data {
  int<lower=0> N;
  int<lower=1> J;
  int<lower=1,upper=J> id[N];
  vector[N] y;
}

parameters {
  real mu;
  real<lower=0> tau;
  vector[J] mu_spec;
  vector<lower=0>[J] tau_spec;
  vector[J] theta;
  vector[J] gamma;
  vector<lower=0>[J] sigma;
  vector<lower=0, upper=1>[J] p;
}

model {
  mu ~ normal(0, 10);
  tau ~ cauchy(0, 2.5);
  
  for (j in 1:J){
    mu_spec[j] ~ normal(0, 10);
    tau_spec[j] ~ cauchy(0, 2.5);
    sigma[j] ~ cauchy(0, 2.5);
    p[j] ~ beta(1, 1);
    
    theta[j] ~ normal(mu, tau);
    gamma[j] ~ normal(mu_spec[j], tau_spec[j]);
  }

  for (n in 1:N){
    target += log_mix(p[id[n]],
                      normal_lpdf(y[n]|theta[id[n]], sigma[id[n]]),
                      normal_lpdf(y[n]|gamma[id[n]], sigma[id[n]]));
  }
}

While the model (i) works very well, I have a significant number of divergent transitions for model (ii).

library(rstan)
library(bayesplot)

J <- 4
mu <- 5
tau <- 3
mu_spec <- -5
tau_spec <- 1
theta <- rnorm(3, mu, tau)
theta[4] <- rnorm(1, mu_spec, tau_spec)
sigma <- abs(rcauchy(4, 0, 2.5))

response <- c(rnorm(50, theta[1], sigma[1]), rnorm(50, theta[2], sigma[2]),
              rnorm(50, theta[3], sigma[3]), rnorm(50, theta[4], sigma[4]))
id <- rep(1:J, each = 50)

data <- list(N = length(response),
             J = J,
             id = id,
             y = response)

mixture_ex_nex <- stan_model("mixture_ex_nex.stan")
mixture_ex <- stan_model("mixture_ex.stan")

fit_ex <- sampling(mixture_ex,
                data = data,
                warmup = 5000,
                iter = 15000,
                chains = 3,
                control = list(adapt_delta = 0.9))

fit_ex_nex <- sampling(mixture_ex_nex,
                data = data,
                warmup = 5000,
                iter = 15000,
                chains = 3,
                control = list(adapt_delta = 0.9))

#1: There were 1942 divergent transitions after warmup. See
#http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#to find out why this is a problem and how to eliminate them. 
#2: There were 252 transitions after warmup that exceeded the maximum treedepth. Increase #max_treedepth above 10. See
#http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 
#3: Examine the pairs() plot to diagnose sampling problems
# 
#4: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may #be unreliable.
#Running the chains for more iterations may help. See
#http://mc-stan.org/misc/warnings.html#tail-ess

I checked the link about divergent transitions after warmup, and I increased the adapt_delta parameter to 0.9, but there are still too many divergent transitions. I am wondering if I am implementing model (ii) correctly because I am fairly new using STAN.

Files used are attached:
mixture_ex.stan (434 Bytes)
mixture_ex_nex.stan (808 Bytes)
test.R (3.2 KB)

martinmodrak · November 3, 2021, 4:17pm

Unfortunately, that looks like a difficult model to debug. I unfortunately currently don’t have time to delve deepere into this, the best general advice I can give is at Divergent transitions - a primer

Best of luck with your model!

maxbiostat · November 3, 2021, 5:29pm

@betanalpha probably knows what is going on here vis-a-vis exchangeability and divergences.

betanalpha · November 8, 2021, 3:12pm

There is far too much material on degeneracy and mixture models, how that degeneracy breaks computational methods, and how those failures can be diagnosed to cover in a concise reply.

A few things to keep in mind:

Neuenschwander et al. (2015) isn’t open access but the supplementary material implies that they fit the model with WinBUGS and there isn’t any discussion of fit diagnostics. Consequently one cannot assume that the model actually fit accurately in the paper and that any Stan issues are programmer error. Indeed one of the benefits of Stan’s Hamiltonian Monte Carlo sampler is that it is much more informative about fitting problems than WinBUGS’s Gibbs sampler and so Stan is often much more revealing about problems that were always there but other methods missed.

The Cauchy priors are problematic. For some discussion see Section 3.2 of Prior Modeling.

Finally exchangeable mixture models are fundamentally non-identified and nearly impossible to fit accuracy without breaking that fundamental degeneracy. For some very incomplete discussion see Section 5.4 of Identity Crisis.

wds15 · November 8, 2021, 3:41pm

As a start I would avoid modelling “delta” as a Bernoulli. Just set the prior weights fixed. That should make sampling easier.

The OncoBayes2 R package implements a more constrained model, but it does support EXNEX for binomial data being modelled using an intercept & (positive) slope model. Typical data sets run just fine with it, even with EXNEX.

Topic		Replies	Views
Trouble with Gaussian Mixture Model Modeling	10	4188	December 7, 2018
Identifiability of Gaussian mixture mode Modeling	24	2805	October 29, 2017
Regime Switching Model Modeling	4	1694	June 15, 2018
Any suggestions to speed up mixture model? (dirichlet process) Modeling	1	2517	July 15, 2017
Informed prior for a bernouilli logormal mixture model - divergent transitions in STAN Modeling	2	390	July 23, 2019

Mixture between non-exchangeable and exchange strata parameters from Neuenschwander et al. (2015)

Related topics