Adaptively truncating parameters to help with convergence in missing data model

Hello,

I’m working on fitting an intercept-only model for which the response variable is 100% missing, and I could use some help fixing some convergence issues.

The model involves inverse transform sampling to specify the prior for each missing observation: I define a parameter p \sim \text{Uniform[}0, 1], then use that to specify the actual prior via its inverse CDF, as @bgoodri shows here. The prior for each observation is a Johnson Quantile Parameterized Distribution (JQPD), which is fully parameterized by a triplet of quantiles, which I pass to the model as data.

At first this model had nearly 100% divergent transitions, but then I noticed that for many missing responses, its JQPD prior is very flat along much of its support. For example, here is the PDF of the prior for one of the missing responses:

So I wrote an R function to figure out which quantiles bound 99% of the density for each prior, and pass those bounds to the declaration of the p \sim \text{Uniform[}0, 1] parameter. This makes the model sample very well, and all the inferences are making sense, except that the posterior quantiles of the inferred ‘true’ values for each missing response do not match those of the empirical quantiles I used to parameterize their JQPD priors, I assume because by truncating I have not allowed the sampler to explore the whole distribution for each missing response. When I instead use the quantiles that bound 99.9% of the density, the inferences make no sense, but the model still samples well and rhats remain at 1.

I have two questions:

  1. Is truncating parameters considered good practice for dealing with convergence issues?
  2. Are there other approaches to take here that might improve the sampling without artificially constraining the support of the priors? It would be great to have an approach that results in each inferred ‘true’ response variable having the same distribution as its known PDF, which are produced by the generated quantities block in my code below.

Thanks in advance for your help!

Here’s my data and Stan code:

functions{
    // Inverse CDF of the J-QPD-B bounded distribution. It is bounded on both sides.
    real JQPDB_icdf(real p, real alpha, real q_low, real q_median, real q_high, real lower_bound, real upper_bound) {
        if (p < 0 || p > 1)         reject("p must be between 0 and 1");
        if (alpha < 0 || alpha > 1) reject("alpha must be between 0 and 1");
        real c = inv_Phi(1 - alpha);
        real uml = upper_bound - lower_bound;
        real L = inv_Phi( (q_low - lower_bound) / uml );
        real B = inv_Phi( (q_median - lower_bound) / uml );
        real H = inv_Phi( (q_high - lower_bound) / uml );
        real HmL = H - L;
        real delta = acosh(0.5 * HmL / fmin(B - L, H - B)) / c;
        real lambda = HmL / sinh(2 * delta * c);
        real LHm2B = L + H - 2 * B;
        real n;
        real zeta;
        if (LHm2B < 0) {
          n = -1;
          zeta = H;
        } else if (LHm2B > 0) {
          n = 1;
          zeta = L;
        } else { // LHm2B = 0 -> removable discontinuity
          return lower_bound + uml * Phi(B + 0.5 * HmL / c * inv_Phi(p));
        }
        return lower_bound + uml * Phi(zeta + lambda * sinh(delta * (inv_Phi(p) + n * c)));
    }
}
data {
  int<lower=1> N; // total number of observations
  real<lower=0, upper=0.5> alpha; // tells the JQPD prior what the quantiles are
  real<lower=0> jqpd_lb; // lower bound for JQPD prior
  real<lower=0> jqpd_ub; // upper bound for JQPD prior
  vector<lower=0>[N] uv_lb; // lower bounds for uniform variate
  vector<lower=0>[N] uv_ub; // upper bounds for uniform variate
  vector[N] q_median; // response variable
  vector[N] q_lower; // response variable
  vector[N] q_upper; // response variable
  int prior_only; // should the likelihood be ignored?
}
parameters {
  real<lower=0, upper =1> mu; // mean response
  real<lower=0> phi; // precision parameter
  vector<lower=uv_lb, upper=uv_ub>[N] p; // Uniform Quantiles for JQPD
}
transformed parameters {
  vector<lower=0, upper=1>[N] Y_true; // Latent 'true' response
  real lprior = 0; // prior contributions to the log posterior
  lprior += gamma_lpdf(phi | 0.1, 0.1); // Prior for dispersion of 'true' responses
  for(i in 1:N){
    Y_true[i] = JQPDB_icdf(p[i], alpha, q_lower[i], q_median[i], q_upper[i], jqpd_lb, jqpd_ub); // Inverse transform to get priors for 'true' responses
  }
}
model {
  if (!prior_only) {
    target += beta_lpdf(Y_true | mu * phi, (1 - mu) * phi); // Likelihood
  }
  target += lprior;
}
generated quantities {
  vector<lower=0, upper=1>[N] p_draws; // Continuously sampled p
  vector[N] latent_response_draws; // Forecast draws
  for (i in 1:N) {
      p_draws[i] = uniform_rng(0, 1); // Sampling p from a uniform distribution for each respondent and question
      real q_low = q_lower[i];
      real q_med = q_median[i];
      real q_high = q_upper[i];
      real lower_bound = 0.0; 
      real upper_bound = 1.0; 
      latent_response_draws[i] = JQPDB_icdf(p_draws[i], alpha, q_low, q_med, q_high, lower_bound, upper_bound); // Sample from the true JQPD PDF, to compare with the posterior latent values
  }
}


data <- tibble(
  respondent_id = c(1, 2, 3, 4, 5),
  lower = c(0.05, 0.06, 0.12, 0.12, 0.0875),
  upper = c(0.35, 0.18, 0.79, 0.79, 0.528),
  median = c(0.25, 0.12, 0.55, 0.55, 0.368),
  uniform_variate_lower_bound = c(0.003981868, 0.037112355, 0.016346837, 0.016346837, 0.012227479),
  uniform_variate_upper_bound = c(0.3848918, 0.2077889, 0.8550201, 0.8550201, 0.5838337)
)

data_list <- list(
  N = nrow(data),
  alpha = .05,
  jqpd_lb = 0,
  jqpd_ub = 1,
  uv_lb = data$uniform_variate_lower_bound,
  uv_ub = data$uniform_variate_upper_bound,
  q_median = data$median,
  q_lower = data$lower,
  q_upper = data$upper,
  prior_only = 0
)

file <- file.path("analysis/stan/stancode/model.stan")
model <- cmdstan_model(file)

fit <- model$sample(
  data = data_list,
  seed = 199,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000,
  iter_sampling = 1000,
  adapt_delta = .99,
  max_treedepth = 12
)

How can we build a regression where the response is 100% missing? Maybe I misunderstand, but are you talking about a regression like y \sim \textrm{normal}(x \cdot \beta, \sigma) where all the y are missing? That will just generate data given x, \beta, \sigma.

I’m afraid I’ve never heard of JQPD’s (unless that’s what Ben’s calling what he’s doing now), so maybe we can get @bgoodri on the line here.

My general advice is to simplify to a model that works (for example, by using simpler priors), then gradually build up to this model. At least that will let you identify where things go wrong.