Vastly different posteriors with same initial values

cbeck · August 31, 2020, 8:30pm

Stan Forum,

I am running some simulations using Jeffreys priors. I am working with location scale distributions and putting a flat prior on \mu and a prior of 1/\sigma^2 on the scale parameter. I am coding this in Stan using target += -2*log(sigma), and not explicitly defining any prior for \mu.

I am not sure why, but I am getting vastly different posteriors even when using the same initial values. Sometimes I get a well-behaved posterior with no divergent transitions and other times a set of draws with thousands of divergent transitions and posterior mass extremely far away from the true values of the parameters. Does anyone have an idea for this discrepancy? I have a decent amount of data for only having to estimate two parameters, so I would expect the likelihood to be well-defined and Stan to be able to get a reasonable posterior even with non-informative priors. Any insights would be much appreciated.

~Colin

maxbiostat · August 31, 2020, 11:33pm

Can you share your full Stan code and perhaps some more details on your model?

cbeck · September 1, 2020, 5:19pm

Hi @maxbiostat ,

Yes, this is the model I am running for a single Weibull distribution. I am taking the log of time (y), and thus using the smallest extreme value distribution (SEV) for the likelihood piece. To simulate data, I am generating data from a Weibull distribution. To keep things simple I am assuming there is no right censored data for now.

// Stan model for a single Weibull distribution

functions {
  
  // sev logpdf
  
  real sev_logpdf(real y, real mu, real sigma){
    real z;
    z = (y - mu) / sigma;
    return -log(sigma) + z - exp(z);
  }
  
  // sev log 1-cdf
  
  real sev_logccdf(real y, real mu, real sigma){
    return -exp((y - mu) / sigma);
  }
}


data {
  int<lower=1> N;  // number of observations
  real<lower=0> y[N]; // response
  int<lower=1, upper=2> cencode[N]; // 1-failure, 2-right
  real p;  //  quantile for reparameterization
}

transformed data{
  real log_y[N];
  log_y = log(y);
}

parameters{
  real<lower=0> tp;
  real<lower=0> sigma;
}

transformed parameters{
  real mu;
  mu = log(tp) - sigma * log(-1.0 * log1m(p));
}

model{
  //
    // compute the log "probability" in target
  //
    for(n in 1:N){
      //  exact
      if (cencode[n]==1)
        target += sev_logpdf(log_y[n], mu, sigma);
        //  right censored
        if (cencode[n]==2)
          target += sev_logccdf(log_y[n], mu, sigma);
    }
  
  // jeffreys prior, 1/sigma^2, and flat prior for mu
  target += -2*log(sigma);
}

EDIT: @maxbiostat has edited this post for syntax highlighting.

maxbiostat · September 1, 2020, 5:48pm

Thanks for providing more details. I’ll have a more detailed look later today, but for now, can you fit this model to simulated data?

cbeck · September 1, 2020, 6:52pm

testdat.txt (165 Bytes)

@maxbiostat Thanks!

Yes, here is some simulated data with 5 failures (observations) and no censoring. It should be easy to estimate (true values: mu=0, sigma=1/3), but even after setting reasonable initial values I often get thousands of divergences and nonsensical posteriors.

~Colin

maxbiostat · September 1, 2020, 7:42pm

Hi again,

One thing to check is that the posterior is actually proper, because under improper priors there is no guarantee. Let me know if you need help with that.

cbeck · September 1, 2020, 8:03pm

Hello,

For location-scale or log-location-scale families the Jeffreys and reference prior should ensure a proper posterior, I believe.

maxbiostat · September 2, 2020, 5:52pm

Why is this done? Why is the data log-transformed?

cbeck · September 2, 2020, 8:05pm

This is because if y follows a Weibull distribution, then log(y) follows the SEV distribution.

I did some more playing around with transformations and things look much better once you do a reparameterization with log(tp) and log(sigma) with flat priors. The likelihood is way better behaved and the MCMC doesn’t have as many divergent transitions.

maxbiostat · September 2, 2020, 8:35pm

Yup. This is what I arrived at after a little playing round:

// Stan model for a single Weibull distribution

functions {
  // sev logpdf
    real sev_logpdf(real y, real mu, real sigma){
    real z;
    z = (y - mu) / sigma;
    return -log(sigma) + z - exp(z);
  }
  
  // sev log 1-cdf
  
  real sev_logccdf(real y, real mu, real sigma){
    return -exp((y - mu) / sigma);
  }
}


data {
  int<lower=1> N;  // number of observations
  real<lower=0> y[N]; // response
  int<lower=1, upper=2> cencode[N]; // 1-failure, 2-right
  real p;  //  quantile for reparameterization
}

transformed data{
  real log_y[N];
  log_y = log(y);
}

// transformed data{
//   real log_y[N];
//   log_y = log(y);
// }

parameters{
  real mu;
  real<lower=0> sigma;
}

model{
  //
    // compute the log "probability" in target
  //
    for(n in 1:N){
      //  exact
      if (cencode[n]==1)
        target += sev_logpdf(log_y[n], mu, sigma);
        //  right censored
        if (cencode[n]==2)
          target += sev_logccdf(log_y[n], mu, sigma);
    }
  
  // jeffreys prior, 1/sigma^2, and flat prior for mu
  target += -2*log(sigma);
}

generated quantities{
  real tp = exp(mu + sigma * log(-1.0 * log1m(p)) );
}

Ran with

library(rstan)
dat <- read.csv(file = "testdat.txt")

sev.data <- list(
   N = nrow(dat),
   y = dat$y,
   cencode = dat$cencode,
   p = dat$p[1]
)

sev_model <- stan_model("sev.stan")

mcmc <- sampling(sev_model, data = sev.data,
                 control=list(adapt_delta=.9))

Gives

> mcmc
Inference for Stan model: sev.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

       mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
mu    -0.03    0.01 0.31 -0.65 -0.21 -0.03  0.14  0.58  1338    1
sigma  0.60    0.01 0.27  0.28  0.42  0.54  0.71  1.30  1288    1
tp     0.82    0.01 0.27  0.37  0.65  0.80  0.96  1.40  1519    1
lp__  -5.15    0.03 1.15 -8.25 -5.60 -4.83 -4.32 -3.98  1130    1

Samples were drawn using NUTS(diag_e) at Wed Sep  2 17:33:43 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Topic		Replies	Views
Looking for help on setting up a Weibull distribution model (RStan) Modeling rstan	1	496	August 7, 2021
Generalized Extreme value distribution with only one prior Modeling rstan , techniques , fitting-issues	2	215	March 25, 2024
Defining Jeffreys' prior for transformed parameters and censored data Modeling techniques , fitting-issues	12	1602	April 7, 2020
Multiple priors for the same parameter Modeling	10	3527	October 29, 2021
Sampler returns the same value for each parameter in Weibull Model Modeling fitting-issues	1	284	October 25, 2023

Vastly different posteriors with same initial values

Related topics