Mixture between non-exchangeable and exchange strata parameters from Neuenschwander et al. (2015)

Hi there. I am currently implementing two mixture models discussed in Neuenschwander et al. (2015) paper, but for a continuous endpoint: Y_{ij} defined as the response variable for subject i from strata j for i = 1, \ldots, N_j, j = 1, \ldots, 4 such that N = \sum_{j = 1}^4N_j. The models are the following:

(i) Exchangeable strata parameters
Y_{ij} \sim N(\theta_j, \sigma_j)
\theta_j \sim N(\mu, \tau) for j = 1, \ldots, J
\sigma_j \sim Half-Cauchy(0, 2.5) for j = 1, \ldots, J
\mu \sim N(0, 10)
\tau \sim Half-Cauchy(0, 2.5)

using the following STAN code:

data {
  int<lower=0> N;
  int<lower=1> J;
  int<lower=1,upper=J> id[N]; 
  vector[N] y;
}

parameters {
  real mu;
  real<lower=0> tau;
  vector[J] theta;
  vector<lower=0>[J] sigma;
}

model {
  mu ~ normal(0, 10);
  tau ~ cauchy(0, 2.5);
  
  for (j in 1:J){
    sigma[j] ~ cauchy(0, 2.5);
    theta[j] ~ normal(mu, tau);
  }

  for (n in 1:N){
    y[n] ~ normal(theta[id[n]], sigma[id[n]]);
  }
  
}

(ii) Mixture between Non-Exchangeable and Exchangeable strata parameters:
Y_{ij} \sim N(\theta_j, \sigma_j)
\theta_j \sim \delta_j N(\mu_0, \tau_0) + (1 - \delta_j) N(\mu_j, \tau_j) for j = 1, \ldots, J
\sigma_j \sim Half-Cauchy(0, 2.5) for j = 1, \ldots, J
\mu_j \sim N(0, 10) for j = 0, \ldots, J
\tau_j \sim Half-Cauchy(0, 2.5) for j = 0, \ldots, J
\delta_j \sim Bernouli(p_j)
p_j \sim Beta(1, 1)

using the following STAN Code:

data {
  int<lower=0> N;
  int<lower=1> J;
  int<lower=1,upper=J> id[N];
  vector[N] y;
}

parameters {
  real mu;
  real<lower=0> tau;
  vector[J] mu_spec;
  vector<lower=0>[J] tau_spec;
  vector[J] theta;
  vector[J] gamma;
  vector<lower=0>[J] sigma;
  vector<lower=0, upper=1>[J] p;
}

model {
  mu ~ normal(0, 10);
  tau ~ cauchy(0, 2.5);
  
  for (j in 1:J){
    mu_spec[j] ~ normal(0, 10);
    tau_spec[j] ~ cauchy(0, 2.5);
    sigma[j] ~ cauchy(0, 2.5);
    p[j] ~ beta(1, 1);
    
    theta[j] ~ normal(mu, tau);
    gamma[j] ~ normal(mu_spec[j], tau_spec[j]);
  }

  for (n in 1:N){
    target += log_mix(p[id[n]],
                      normal_lpdf(y[n]|theta[id[n]], sigma[id[n]]),
                      normal_lpdf(y[n]|gamma[id[n]], sigma[id[n]]));
  }
}

While the model (i) works very well, I have a significant number of divergent transitions for model (ii).

library(rstan)
library(bayesplot)

J <- 4
mu <- 5
tau <- 3
mu_spec <- -5
tau_spec <- 1
theta <- rnorm(3, mu, tau)
theta[4] <- rnorm(1, mu_spec, tau_spec)
sigma <- abs(rcauchy(4, 0, 2.5))

response <- c(rnorm(50, theta[1], sigma[1]), rnorm(50, theta[2], sigma[2]),
              rnorm(50, theta[3], sigma[3]), rnorm(50, theta[4], sigma[4]))
id <- rep(1:J, each = 50)

data <- list(N = length(response),
             J = J,
             id = id,
             y = response)

mixture_ex_nex <- stan_model("mixture_ex_nex.stan")
mixture_ex <- stan_model("mixture_ex.stan")

fit_ex <- sampling(mixture_ex,
                data = data,
                warmup = 5000,
                iter = 15000,
                chains = 3,
                control = list(adapt_delta = 0.9))

fit_ex_nex <- sampling(mixture_ex_nex,
                data = data,
                warmup = 5000,
                iter = 15000,
                chains = 3,
                control = list(adapt_delta = 0.9))

#1: There were 1942 divergent transitions after warmup. See
#http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#to find out why this is a problem and how to eliminate them. 
#2: There were 252 transitions after warmup that exceeded the maximum treedepth. Increase #max_treedepth above 10. See
#http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 
#3: Examine the pairs() plot to diagnose sampling problems
# 
#4: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may #be unreliable.
#Running the chains for more iterations may help. See
#http://mc-stan.org/misc/warnings.html#tail-ess 

I checked the link about divergent transitions after warmup, and I increased the adapt_delta parameter to 0.9, but there are still too many divergent transitions. I am wondering if I am implementing model (ii) correctly because I am fairly new using STAN.

Files used are attached:
mixture_ex.stan (434 Bytes)
mixture_ex_nex.stan (808 Bytes)
test.R (3.2 KB)

1 Like