Vastly different posteriors with same initial values

Stan Forum,

I am running some simulations using Jeffreys priors. I am working with location scale distributions and putting a flat prior on \mu and a prior of 1/\sigma^2 on the scale parameter. I am coding this in Stan using target += -2*log(sigma), and not explicitly defining any prior for \mu.

I am not sure why, but I am getting vastly different posteriors even when using the same initial values. Sometimes I get a well-behaved posterior with no divergent transitions and other times a set of draws with thousands of divergent transitions and posterior mass extremely far away from the true values of the parameters. Does anyone have an idea for this discrepancy? I have a decent amount of data for only having to estimate two parameters, so I would expect the likelihood to be well-defined and Stan to be able to get a reasonable posterior even with non-informative priors. Any insights would be much appreciated.

~Colin

Can you share your full Stan code and perhaps some more details on your model?

Hi @maxbiostat ,

Yes, this is the model I am running for a single Weibull distribution. I am taking the log of time (y), and thus using the smallest extreme value distribution (SEV) for the likelihood piece. To simulate data, I am generating data from a Weibull distribution. To keep things simple I am assuming there is no right censored data for now.

// Stan model for a single Weibull distribution

functions {
  
  // sev logpdf
  
  real sev_logpdf(real y, real mu, real sigma){
    real z;
    z = (y - mu) / sigma;
    return -log(sigma) + z - exp(z);
  }
  
  // sev log 1-cdf
  
  real sev_logccdf(real y, real mu, real sigma){
    return -exp((y - mu) / sigma);
  }
}


data {
  int<lower=1> N;  // number of observations
  real<lower=0> y[N]; // response
  int<lower=1, upper=2> cencode[N]; // 1-failure, 2-right
  real p;  //  quantile for reparameterization
}

transformed data{
  real log_y[N];
  log_y = log(y);
}

parameters{
  real<lower=0> tp;
  real<lower=0> sigma;
}

transformed parameters{
  real mu;
  mu = log(tp) - sigma * log(-1.0 * log1m(p));
}

model{
  //
    // compute the log "probability" in target
  //
    for(n in 1:N){
      //  exact
      if (cencode[n]==1)
        target += sev_logpdf(log_y[n], mu, sigma);
        //  right censored
        if (cencode[n]==2)
          target += sev_logccdf(log_y[n], mu, sigma);
    }
  
  // jeffreys prior, 1/sigma^2, and flat prior for mu
  target += -2*log(sigma);
}

EDIT: @maxbiostat has edited this post for syntax highlighting.

1 Like

Thanks for providing more details. I’ll have a more detailed look later today, but for now, can you fit this model to simulated data?

testdat.txt (165 Bytes)

@maxbiostat Thanks!

Yes, here is some simulated data with 5 failures (observations) and no censoring. It should be easy to estimate (true values: mu=0, sigma=1/3), but even after setting reasonable initial values I often get thousands of divergences and nonsensical posteriors.

~Colin

Hi again,

One thing to check is that the posterior is actually proper, because under improper priors there is no guarantee. Let me know if you need help with that.

Hello,

For location-scale or log-location-scale families the Jeffreys and reference prior should ensure a proper posterior, I believe.

Why is this done? Why is the data log-transformed?

This is because if y follows a Weibull distribution, then log(y) follows the SEV distribution.

I did some more playing around with transformations and things look much better once you do a reparameterization with log(tp) and log(sigma) with flat priors. The likelihood is way better behaved and the MCMC doesn’t have as many divergent transitions.

Yup. This is what I arrived at after a little playing round:

// Stan model for a single Weibull distribution

functions {
  // sev logpdf
    real sev_logpdf(real y, real mu, real sigma){
    real z;
    z = (y - mu) / sigma;
    return -log(sigma) + z - exp(z);
  }
  
  // sev log 1-cdf
  
  real sev_logccdf(real y, real mu, real sigma){
    return -exp((y - mu) / sigma);
  }
}


data {
  int<lower=1> N;  // number of observations
  real<lower=0> y[N]; // response
  int<lower=1, upper=2> cencode[N]; // 1-failure, 2-right
  real p;  //  quantile for reparameterization
}

transformed data{
  real log_y[N];
  log_y = log(y);
}

// transformed data{
//   real log_y[N];
//   log_y = log(y);
// }

parameters{
  real mu;
  real<lower=0> sigma;
}

model{
  //
    // compute the log "probability" in target
  //
    for(n in 1:N){
      //  exact
      if (cencode[n]==1)
        target += sev_logpdf(log_y[n], mu, sigma);
        //  right censored
        if (cencode[n]==2)
          target += sev_logccdf(log_y[n], mu, sigma);
    }
  
  // jeffreys prior, 1/sigma^2, and flat prior for mu
  target += -2*log(sigma);
}

generated quantities{
  real tp = exp(mu + sigma * log(-1.0 * log1m(p)) );
}

Ran with

library(rstan)
dat <- read.csv(file = "testdat.txt")

sev.data <- list(
   N = nrow(dat),
   y = dat$y,
   cencode = dat$cencode,
   p = dat$p[1]
)

sev_model <- stan_model("sev.stan")

mcmc <- sampling(sev_model, data = sev.data,
                 control=list(adapt_delta=.9))

Gives

> mcmc
Inference for Stan model: sev.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

       mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
mu    -0.03    0.01 0.31 -0.65 -0.21 -0.03  0.14  0.58  1338    1
sigma  0.60    0.01 0.27  0.28  0.42  0.54  0.71  1.30  1288    1
tp     0.82    0.01 0.27  0.37  0.65  0.80  0.96  1.40  1519    1
lp__  -5.15    0.03 1.15 -8.25 -5.60 -4.83 -4.32 -3.98  1130    1

Samples were drawn using NUTS(diag_e) at Wed Sep  2 17:33:43 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
1 Like