Highly dispersed ppd


Hello, I’m trying to model some longitudinal data in a multilevel model in brms. The data is from skin conductance recordings and positively skewed, so I’ve adopted the generalized log-normal model. The PPD was heavily dispersed, even with weakly informed priors. So, I reduced the model down and only examined a single-level model and the results were the same. Applying a gamma link function resulted in similar results, which stunned me because when I used rstanarm, there was some over-dispersion but not on the magnitude I’m seeing in the brms model. I should add, I’m new to brms, so perhaps there’s some specification I’m missing. Is there an explanation for this over-dispersion in the PPD? Could someone point me in the direction of expanding the model to resolve it, if so?

Here’s the model below and it’s summary.

priors <- c(
  set_prior("student_t(4, -3.4, 0.1)", class="Intercept"),
  set_prior("normal(0, 1)", class="b", coef="Lat"),
  set_prior("normal(0, 1)", class="b", coef="Timings"),
  )
ln_fit <- brm(
  SCR ~ Timings + Lat,
  data = scr_data_long, family = lognormal(), prior = priors,
  chains = 4, cores = 3, warmup = 1000, iter = 2000, thin = 4
  )
 Family: lognormal 
  Links: mu = identity; sigma = identity 
Formula: SCR ~ Timings + Lat 
   Data: scr_data_long (Number of observations: 4263) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 4;
         total post-warmup draws = 1000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -5.70      0.15    -5.99    -5.43 1.00      828      931
Timings       0.03      0.00     0.02     0.03 1.00      846      953
Lat          -0.13      0.08    -0.29     0.01 1.00      984      837

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     3.34      0.04     3.27     3.41 1.00      976      866

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

I also show the mcmc plot below and one draw from the PPD in a histogram figure.

For reference, here’s a histogram of the actual data.


1 Like

The posterior predictive checks are incorporating the variation that comes from the parameter sigma. Your fitted value of sigma is pretty large, indicating the your model thinks there is a large amount of variability around your mean values. I can’t be sure this the problem without both the data and extra knowledge about what you’re modelling, but it’s possible that a lognormal model is not a good fit to the data generating process.

Maybe an exponential distribution or a beta distribution would be better?

1 Like