 # Stan not exploring parameter space

I have simulated some datasets from a model and I am trying to recover the parameter values using Stan.

Each dataset contains failure times for a specific unit. There are 5 units and each unit has 60 failure times (t_{ij}, where i = 1,\dots,5 and j = 1,\dots,60).

I am trying to fit a hierarchical model to the data and recover the model parameters.

I have simulated two datasets. I am able to recover the model parameters from the first dataset using a hierarchical model but I cannot do the same for the second dataset. Interestingly, for the second dataset, if I fit a non-hierarchical model to each unit separately, I am able to recover the model parameters. However, I cannot fit separate models to the real dataset I am working with, and so I am trying to understand why the hierarchical model is not working for dataset 2.

A short description of the model. The failure-time model is a mixture of two Weibull distributions the pdf is given by:

h(t \mid \pi, t_{p_1}, \sigma_1, t_{p_2}, \sigma_2) = \pi f_1(t \mid t_{p_1}, \sigma_1)(1-F_2(t \mid t_{p_2}, \sigma_2)) + f_2(t \mid t_{p_2}, \sigma_2)(1- \pi F_1(t \mid t_{p_1}, \sigma_1)),

where f_1(t), f_2(t) are pdf’s of Weibull distributions and F_1(t), F_2(t) are cdf’s of Weibull distributions. The log-likelihood is

\sum_{i=1}^5\sum_{j=1}^{60} \log(h(t_{ij} \mid \pi, t_{p_1}, \sigma_1, t_{p_2}, \sigma_2)),

where

\log(h(t_{ij} \mid \pi, t_{p_1}, \sigma_1, t_{p_2}, \sigma_2)) = \log(\pi f_1(t_{ij} \mid t_{p_1}, \sigma_1)(1-F_2(t_{ij} \mid t_{p_2}, \sigma_2)) \\+ f_2(t_{ij} \mid t_{p_2}, \sigma_2)(1- \pi F_1(t_{ij} \mid t_{p_1}, \sigma_1)))

can be written as

\log[ \exp(\log(\pi) + \log(f_1) + \log(1 - F_2)) + \exp(\log(f_2) + \log\{1 - \exp[\log(\pi) + \log(F_1)])\} ].

Finally, note that the pdf and cdf of the Weibull distribution can be written as:

F(t) = \Phi_{\text{SEV}}\bigg\{\frac{log(t) - \mu}{\sigma}\bigg\}, and f(t) = \frac{1}{\sigma t}\phi_{\text{SEV}}\bigg\{\frac{log(t) - \mu}{\sigma}\bigg\},
where \Phi_{\text{SEV}}(z) = 1- \exp(-\exp(z)), \phi_{\text{SEV}}(z) = \exp(z-\exp(z)), and \mu = \log(t_p) - \sigma\log(-\log(1-p)), where F(t_p) = \Pr(T \leq t_p) = p, i.e. t_p is the p quantile.

This is not a common Weibull parameterisation but is very useful for failure-time models. Note I have tried parameterising the model in more conventional ways with no success.

The Stan code (GLFPhierarchical.stan) is shown below:
Edit: \sigma_1 and t_{p_1} do not vary across sub-populations; only \sigma_2, t_{p2} and \pi vary across sub-populations.

functions{
real sev_logpdf(real y, real mu, real sigma){
real z;
z = (log(y) - mu) / sigma;
return -log(sigma)-log(y) + z - exp(z);
}
real sev_logccdf(real y, real mu, real sigma){
return -exp((log(y) - mu) / sigma);
}
real sev_logcdf(real y, real mu, real sigma){
return log1m_exp(-exp((log(y) - mu) / sigma));
}
}
data{
int M;
int N_obs;
real endtime_obs[N_obs];
int<lower=1, upper=M> dm_obs[N_obs];
vector<lower=0, upper=1> p;
}
transformed data{
vector z_corr;
for(i in 1:2){
z_corr[i] = log(-1.0 * log1m(p[i]));
}
}
parameters{
real log_tp1;
real<lower=0> sigma1;
real eta_tp2;
real eta_s2;
real eta_pi;
real<lower=0> tau_tp2;
real<lower=0> tau_s2;
real<lower=0> tau_pi;
vector[M] log_tp2_raw;
real<lower=0,upper=1> sigma2[M];
vector[M] logit_pi_raw;
}
transformed parameters{
real mu1;
vector[M] mu2;
real<upper=0> log_pi[M];
mu1 = log_tp1 - sigma1 * z_corr;
for(m in 1:M){
mu2[m] = (eta_tp2 + tau_tp2 * log_tp2_raw[m]) - (sigma2[m] * z_corr);
log_pi[m] = log_inv_logit(eta_pi + tau_pi * logit_pi_raw[m]);
}
}
model{
int m;
real logpi;
real mu_2;
real sig_2;
for(i in 1:N_obs){
m = dm_obs[i];
logpi = log_pi[m];
mu_2 = mu2[m];
sig_2 = sigma2[m];
target +=  log_sum_exp(logpi + sev_logpdf(endtime_obs[i], mu1, sigma1) +
sev_logccdf(endtime_obs[i], mu_2, sig_2),
sev_logpdf(endtime_obs[i], mu_2, sig_2) +
log1m_exp(logpi + sev_logcdf(endtime_obs[i], mu1, sigma1)));
}
//log_tp1 ~ normal(0, 1);
log_tp1 ~ normal(7.6, 0.1);
log_tp2_raw ~ normal(0,1);
sigma1 ~ lognormal(0, 1);
sigma2 ~ lognormal(eta_s2, tau_s2);
logit_pi_raw ~ normal(0,1);
eta_tp2 ~ normal(10, 0.1);
//eta_tp2 ~ normal(3, 0.8);
eta_s2 ~ normal(0, 0.1);
eta_pi ~ normal(-3, 1);
//tau_tp2 ~ normal(0.8,0.05);
tau_tp2 ~ normal(0.1,0.01);
tau_s2 ~ normal(0.8,0.1);
tau_pi ~ normal(6,0.1);
}


The parameters for dataset 1 are:

sig1: 2.7674974, 1.3636123, 2.36536296, 1.7899222, 0.54696582
sig2: 0.1868903, 0.1288621, 0.10727463, 0.2409199 , 0.04124024
p (pi): 0.2518340, 0.7712097, 0.02763668, 0.0825721, 0.05819514
t1: 0.8494921, 1.9725750, 1.44696651, 1.5688021, 1.07975276
t2: 26.2052521, 19.6768429, 21.40858651, 23.8159158, 24.78866558

The parameters for dataset 2 are:

sig1: 2.7674974, 1.3636123, 2.36536296, 1.7899222, 0.54696582
sig2: 0.1868903, 0.1288621, 0.10727463, 0.2409199 , 0.04124024
p (pi): 0.2518340, 0.7712097, 0.02763668, 0.0825721, 0.05819514
t1: 2.076728e+03, 2.288995e+03, 2.151443e+03, 2.186511e+03, 2.029098e+03
t2: 1.836761e+04, 2.180122e+04, 2.274035e+04, 2.398484e+04, 2.446976e+04

That is, t1 and t2 are different in dataset 2 for each unit.

The prior distributions in the Stan model are very strict and give high density around the true values. I have also tried weakly informative priors with no success.

The warning I receive when using dataset 2 is:

Warning messages:
1: There were 879 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: Examine the pairs() plot to diagnose sampling problems


I have also attached traceplots [traceplots.pdf (56.4 KB) ] for the model parameters. Each set of plots looks the same for each parameter. It looks like nothing is happening. There is no exploration for any parameters. It looks like each parameters is stuck at the initial position.

I am not sure what this means. Does anyone have any suggestions?

A quick summary: Dataset 1 can be used to recover the true model parameters and I can fit a separate model to each unit in dataset 2 and recover the true model parameters (using a slightly different Stan script shown below), but I cannot recover the true model parameters using a hierarchical model using dataset 2.

functions{
real sev_logpdf(real y, real mu, real sigma){
real z;
z = (log(y) - mu) / sigma;
return -log(sigma)-log(y) + z - exp(z);
}
real sev_logccdf(real y, real mu, real sigma){
return -exp((log(y) - mu) / sigma);
}
real sev_logcdf(real y, real mu, real sigma){
return log1m_exp(-exp((log(y) - mu) / sigma));
}
}
data{
int N_obs;
real endtime_obs[N_obs];
vector<lower=0, upper=1> p; # quantiles to model
}
transformed data{
vector z_corr;
for(i in 1:2){
z_corr[i] = log(-1.0 * log1m(p[i])); # used to get location(mu) from quantile(t_p)
}
}
parameters{
real log_tp1;
real<lower=0> sigma1;
real log_tp2;
real<lower=0> sigma2;
real logit_pi;
}
transformed parameters{
real mu1;
real mu2;
real log_pi;
mu1 = log_tp1 - sigma1 * z_corr;
mu2 = log_tp2 - (sigma2 * z_corr);
log_pi = log_inv_logit(logit_pi);
}
model{
real tmp;
for(i in 1:N_obs){
target +=  log_sum_exp(log_pi + sev_logpdf(endtime_obs[i], mu1, sigma1) +
sev_logccdf(endtime_obs[i], mu2, sigma2),
sev_logpdf(endtime_obs[i], mu2, sigma2) +
log1m_exp(log_pi + sev_logcdf(endtime_obs[i], mu1, sigma1)));
}
log_tp1 ~ normal(7.6, 0.1);
log_tp2 ~ normal(10, 0.1);
sigma1 ~ lognormal(0, 1);
sigma2 ~ lognormal(0,0.8);
logit_pi ~ normal(-3, 6);
}



For completeness, I have included datasets 1 and 2 and the R-code used to run the models. failuredat.csv (8.1 KB) [dataset 1]
failuredat_test2.csv (8.1 KB) [dataset 2]

#dat is either dataset one or dataset 2
M=length(unique(dat$model)) N_obs = sum(dat$delta)
endtime_obs = dat$end.time[dat$delta==1]
dm_obs = dat$model[dat$delta==1]
p=c(0.5,0.2)

stan_list = list(M=M, N_obs = N_obs, endtime_obs = endtime_obs,dm_obs=dm_obs,p=p)

library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

Failure_time_model = stan("GFLPhierarchical.stan", data=stan_list, iter =10000, seed = 1, chains = 1, warmup = 3000, cores = 1, control = list(adapt_delta = 0.99))


Note that in GLFPhierarchical2.stan (shown above) the priors are set for dataset 2. In order to run the model for dataset 1, one needs to uncomment the priors for log_tp1, eta_tp2 and tau_tp2 (but then comment the priors that are currently chosen).

Also note that the code above is adapted from utch_a_1518273_sm4346.pdf (978.7 KB)

1 Like

Give the divergences in your second model, it is unsurprising that parameter recovery is poor. In general, you cannot trust your posterior when the HMC trajectories produce (even just one or a few) divergences. In many cases, you can eliminate your divergences by reparameterizing the model–i.e. expressing the same posterior distribution using a different parameterization of the probability density function.

For more on divergences and reparameterizations, see:
https://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html
https://www.martinmodrak.cz/2018/02/19/taming-divergences-in-stan-models/

1 Like

Update: The problem parameters seem to be either (1) eta_tp2 or (2) tau_tp2. I have tried allowing eta_tp2 to be considered known and entered the true value in the data block. I did the same assuming tau_tp2 was known.

Both times I received the same issues.

The model allows one of these values to be unknown but not both.

1 Like

Small update: I simulated two further datasets where

log_tp1 ~ normal(0, 0.1);
eta_tp2 ~ normal(7,0.1)


for the first dataset, and

log_tp1 ~ normal(0, 0.1);
eta_tp2 ~ normal(9,0.1)


for the second dataset.

The other parameters were sampled from

  log_tp2_raw ~ normal(0,1);
sigma1 ~ lognormal(0, 1);
sigma2 ~ lognormal(eta_s2, tau_s2);
logit_pi_raw ~ normal(0,1);
eta_s2 ~ normal(0, 0.1);
eta_pi ~ normal(-3, 1);
tau_tp2 ~ normal(0.1,0.01);
tau_s2 ~ normal(0.8,0.1);
tau_pi ~ normal(6,0.1);


for both datasets.

The model parameters were recovered for the first dataset but not for the second. The problem only seems to appear for larger values of eta_tp2. The first dataset that did not work was simulated with eta_tp2 = 10.

For datasets which are generated with a larger value of eta_tp2, the failure times are much larger than for smaller values of eta_tp2. For these datasets it appears that the combination (eta_tp2, tau_tp2) is the issue. Substituting the true value for eta_tp2 or tau_tp2 allows the Stan code to run and recover the true values.

I understand that sometimes Stan can find it difficult to distinguish between a combination of parameters, for example two variance parameters in a covariance matrix, but I do not expect this to be the case since eta_tp2 and tau_tp2 are very different in size. I may be wrong.

I wonder why the datasets with larger failure times find it difficult to estimate both parameters. I find it interesting that I can substitute tau_tp2 = 0.1 as data and the model works fine. The prior I have set for tau_tp2 is centered around 0.1 and all of the density of this prior is on values that could be used to simulate a similar dataset to the one I generated. By that I mean, even if I took tau_tp2 = 0.05 or 0.15 (noting that these are extremes of a normal(0.1,0.01) distribution), these values would generate a dataset very similar to the one generated with tau_tp2 = 0.1.

Update: The code runs without divergences if I specify an initial value for eta_tp2. If I set eta_tp2 > 3.5 there are no divergences (even if I set eta_tp2 = 100 there are no divergences, eta_tp2 seems to find its way down to the true value of 9). The problem lies when the initial value for eta_tp2 < 3.5. I am not sure why very large initial values are fine, but smaller (and closer) initial values cause divergences. Does anyone know why small initial values for eta_tp2 are not closing in on the true value?

Strangely, there are still divergences if I set tau_tp2 = 0.1 (and do not set an initial value for eta_tp2). This is interesting because the code works fine if I input tau_tp2 = 0.1 as data.

I.e. if I insert the true value for tau_tp2 = 0.1 as data to the model, there are no divergences (even when Stan initiates the starting value for eta_tp2 (and all other parameters) to be a random value between -2 and 2). However, there are many divergences if I start tau_tp2 at 0.1.

I have read in certain other posts that sampling initial values from the prior distributions may be beneficial if one expects there to be little or no weighting in the region [-2,2]. This kind of works for this example, but sometimes does not. For example, I changed the prior for eta_tp2 to be normal(9,3). Sampling initial values <3.5 cause divergences but sampling larger values does not.

Also, when the initial value causes divergences, not only does eta_tp2 not explore the parameter space, but all other parameters do not move from their initial values either (although I think this will be because the parameter vector updates as a tuple rather than each parameter individually).