I am running some simulations using Jeffreys priors. I am working with location scale distributions and putting a flat prior on \mu and a prior of 1/\sigma^2 on the scale parameter. I am coding this in Stan using target += -2*log(sigma), and not explicitly defining any prior for \mu.

I am not sure why, but I am getting vastly different posteriors even when using the same initial values. Sometimes I get a well-behaved posterior with no divergent transitions and other times a set of draws with thousands of divergent transitions and posterior mass extremely far away from the true values of the parameters. Does anyone have an idea for this discrepancy? I have a decent amount of data for only having to estimate two parameters, so I would expect the likelihood to be well-defined and Stan to be able to get a reasonable posterior even with non-informative priors. Any insights would be much appreciated.

Yes, this is the model I am running for a single Weibull distribution. I am taking the log of time (y), and thus using the smallest extreme value distribution (SEV) for the likelihood piece. To simulate data, I am generating data from a Weibull distribution. To keep things simple I am assuming there is no right censored data for now.

// Stan model for a single Weibull distribution
functions {
// sev logpdf
real sev_logpdf(real y, real mu, real sigma){
real z;
z = (y - mu) / sigma;
return -log(sigma) + z - exp(z);
}
// sev log 1-cdf
real sev_logccdf(real y, real mu, real sigma){
return -exp((y - mu) / sigma);
}
}
data {
int<lower=1> N; // number of observations
real<lower=0> y[N]; // response
int<lower=1, upper=2> cencode[N]; // 1-failure, 2-right
real p; // quantile for reparameterization
}
transformed data{
real log_y[N];
log_y = log(y);
}
parameters{
real<lower=0> tp;
real<lower=0> sigma;
}
transformed parameters{
real mu;
mu = log(tp) - sigma * log(-1.0 * log1m(p));
}
model{
//
// compute the log "probability" in target
//
for(n in 1:N){
// exact
if (cencode[n]==1)
target += sev_logpdf(log_y[n], mu, sigma);
// right censored
if (cencode[n]==2)
target += sev_logccdf(log_y[n], mu, sigma);
}
// jeffreys prior, 1/sigma^2, and flat prior for mu
target += -2*log(sigma);
}

EDIT: @maxbiostat has edited this post for syntax highlighting.

Yes, here is some simulated data with 5 failures (observations) and no censoring. It should be easy to estimate (true values: mu=0, sigma=1/3), but even after setting reasonable initial values I often get thousands of divergences and nonsensical posteriors.

One thing to check is that the posterior is actually proper, because under improper priors there is no guarantee. Let me know if you need help with that.

This is because if y follows a Weibull distribution, then log(y) follows the SEV distribution.

I did some more playing around with transformations and things look much better once you do a reparameterization with log(tp) and log(sigma) with flat priors. The likelihood is way better behaved and the MCMC doesn’t have as many divergent transitions.

Yup. This is what I arrived at after a little playing round:

// Stan model for a single Weibull distribution
functions {
// sev logpdf
real sev_logpdf(real y, real mu, real sigma){
real z;
z = (y - mu) / sigma;
return -log(sigma) + z - exp(z);
}
// sev log 1-cdf
real sev_logccdf(real y, real mu, real sigma){
return -exp((y - mu) / sigma);
}
}
data {
int<lower=1> N; // number of observations
real<lower=0> y[N]; // response
int<lower=1, upper=2> cencode[N]; // 1-failure, 2-right
real p; // quantile for reparameterization
}
transformed data{
real log_y[N];
log_y = log(y);
}
// transformed data{
// real log_y[N];
// log_y = log(y);
// }
parameters{
real mu;
real<lower=0> sigma;
}
model{
//
// compute the log "probability" in target
//
for(n in 1:N){
// exact
if (cencode[n]==1)
target += sev_logpdf(log_y[n], mu, sigma);
// right censored
if (cencode[n]==2)
target += sev_logccdf(log_y[n], mu, sigma);
}
// jeffreys prior, 1/sigma^2, and flat prior for mu
target += -2*log(sigma);
}
generated quantities{
real tp = exp(mu + sigma * log(-1.0 * log1m(p)) );
}

Ran with

library(rstan)
dat <- read.csv(file = "testdat.txt")
sev.data <- list(
N = nrow(dat),
y = dat$y,
cencode = dat$cencode,
p = dat$p[1]
)
sev_model <- stan_model("sev.stan")
mcmc <- sampling(sev_model, data = sev.data,
control=list(adapt_delta=.9))

Gives

> mcmc
Inference for Stan model: sev.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu -0.03 0.01 0.31 -0.65 -0.21 -0.03 0.14 0.58 1338 1
sigma 0.60 0.01 0.27 0.28 0.42 0.54 0.71 1.30 1288 1
tp 0.82 0.01 0.27 0.37 0.65 0.80 0.96 1.40 1519 1
lp__ -5.15 0.03 1.15 -8.25 -5.60 -4.83 -4.32 -3.98 1130 1
Samples were drawn using NUTS(diag_e) at Wed Sep 2 17:33:43 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).