Invariant cases in mixed-effects location scale models

simoncolumbus · October 25, 2022, 2:43pm

I am trying to fit a basic mixed-effects location scale model using brms. The data are from an experience sampling study (~4000 observations from ~300 participants). The outcome variable is continuous, but a bit wonky (right-skewed and zero-inflated).

I’ve tried fitting a null model, specified in brms as:

  brm(bf(outcome ~ 1 + (1|ID1|ppno), 
         sigma ~ 1 + (1|ID1|ppno)),
      data = data, 
      control = list(adapt_delta = 0.95),
      iter = 4000, cores = 4, init = 0)

If I exclude participants that did not vary on the outcome variable (17 participants, 133 observations) the model converges without problems (all Rhat < 1.01, no divergent transitions). Including just one of these participants, however, reduces bulk ESS to ~4 and tail ESS to < 20 for all parameters. I’m a bit stumped by this sensitivity to individual cases and haven’t been able to pinpoint where specifically the issue arises. I’d appreciate any pointers on that.

Below, I’ve included pairs() plots for (A) no data exclusions, (B) exclusion of all but one invariant case, and (C) exclusion of all invariant cases.

jd_c · October 25, 2022, 3:59pm

Howdy!

What do you mean by “did not vary on the outcome”? It seems like for these individuals there are ~7 obs per individual, so does this mean that for one of those individuals the outcome was say 4.3 for each of the 7 observations? So within-subject variability is zero for them?

In this model, it seems like you are estimating a different within-subject variability, sigma, for each individual in a hierarchical model. In brms, for distributional models, sigma is estimated using a log link. I think the brms default is a student t prior for group-level effects, so I am wondering if you set a more regularizing prior, if that would help. I suspect this varying intercept for sigma for each individual, where some individuals have no variation, is at least part of the problem. You might try a more regularizing prior. Also, have you tried it without the correlation between within and between subject variation?

simoncolumbus · October 25, 2022, 5:15pm

Hi! Thanks for the pointers.

What do you mean by “did not vary on the outcome”? It seems like for these individuals there are ~7 obs per individual, so does this mean that for one of those individuals the outcome was say 4.3 for each of the 7 observations? So within-subject variability is zero for them?

Exactly. Pretty much all of them consistently indicated zero on the raw scale (n.b. that the plots I posted are based on a standardised outcome variable).

I think the brms default is a student t prior for group-level effects, so I am wondering if you set a more regularizing prior, if that would help.

The default is a (half) student t prior for the sd class:

                   prior     class      coef group resp  dpar nlpar lb ub       source
 student_t(3, -0.4, 2.5) Intercept                                             default
    student_t(3, 0, 2.5) Intercept                      sigma                  default
    lkj_corr_cholesky(1)         L                                             default
    lkj_corr_cholesky(1)         L            ppno                        (vectorized)
    student_t(3, 0, 2.5)        sd                                   0         default
    student_t(3, 0, 2.5)        sd                      sigma        0         default
    student_t(3, 0, 2.5)        sd            ppno                   0    (vectorized)
    student_t(3, 0, 2.5)        sd Intercept  ppno                   0    (vectorized)
    student_t(3, 0, 2.5)        sd            ppno      sigma        0    (vectorized)
    student_t(3, 0, 2.5)        sd Intercept  ppno      sigma        0    (vectorized)

If I understood you correctly, you would suggest a more regularising prior on the last parameter, i.e., sd_ppno_Intercept_sigma? I will play around with this and see whether it helps.

Also, have you tried it without the correlation between within and between subject variation?

I’ve tried this but didn’t observe any difference. The correlation does cause some trouble in a more complex model with predictors, but it doesn’t look to be the cause of the convergence issues in the null model.

jd_c · October 25, 2022, 5:24pm

Yes, I would try and see if that helps. Maybe try something pretty regularizing, like half-normal(0,0.5), to see if that helps. It sounds like you standardized the outcome, and the model for sigma uses the log link. As always, it is a good idea to use prior predictive checks, but just to troubleshoot, if it were me, I would try a much more regularizing prior to see if that initially helps.

Solomon · October 26, 2022, 2:26pm

You mentioned zero-inflation and no variability. Do you really have a continuous variable, or is it ordinal (i.e., a Likert-type variable)? If the latter, you might switch to a multilevel ordinal model.

simoncolumbus · October 26, 2022, 2:34pm

It’s ratings on a 0-100 scale, with spikes at 0 and, to a lesser degree, 100. Otherwise it looks fairly continuous. I’ve considered binning the data and using an ordinal model, but of course that would increase the problem of participants with no within-person variability.

Solomon · October 26, 2022, 2:35pm

Yeah, plus 0-100 makes for a whole lot of thresholds. I see why you’re going Gaussian.

Solomon · October 26, 2022, 2:36pm

As part of the solution, I believe @andrewheiss has the code for a zero-inflated Gaussian likelihood floating around somewhere.

jd_c · October 26, 2022, 2:54pm

You could also convert them to proportions and try a zero_one_inflated_beta. I have used this before on slider scale data that was ratings of 0-100.

simoncolumbus · October 26, 2022, 4:11pm

Thanks for all the suggestions! I’ve now played around with more regularising priors (specifically for sd_ppno_sigma_Intercept and for all class = “sd” parameters). This has no apparent effect on model performance (i.e., bulk/tail ESS remain extremely low). I also tried some priors based on the posterior parameter distributions from the model fitted on the data from which the invariant cases were excluded; the results are similar.

I really appreciate the recommendations for alternative distributions such as the zero-inflated Gaussian and zero-one-inflated beta. However, if I didn’t miss anything, they wouldn’t allow me to model (and predict) the within-person variability, would they?

jd_c · October 26, 2022, 7:05pm

In brms for zero-one-inflated beta you can model the precision parameter phi. You could also model the zero or one inflation, zoi, and the one inflation parameter, coi.

I imagine any implementation of the zero-inflated Gaussian in brms would still allow you to model sigma. In any case, you could do this in Stan. These are just mixture models with separate data generating processes for observations with response equal to zero vs those not equal to zero.

You can see this by calling make_stancode for some example model in brms. Looking at the relevant Stan function defined in brms for the zero_inflated_beta for example:

real zero_inflated_beta_lpdf(real y, real mu, real phi, real zi) {
     row_vector[2] shape = [mu * phi, (1 - mu) * phi];
     if (y == 0) {
       return bernoulli_lpmf(1 | zi);
     } else {
       return bernoulli_lpmf(0 | zi) +
              beta_lpdf(y | shape[1], shape[2]);
     }

You can see that it is just an if else statement where if the response is zero then the model is bernoulli else it is beta.

Note this by Michael Betancourt here:
"In other words because the the inflated and non-inflated points are essentially modeled by separate data generating processes the entire model decomposes into a binomial model for the total inflated observations and a baseline model for the non-zero observations.

Because these two models are independent we can fit them independently or jointly, whichever is more convenient. In particular if the inflated counts are just a nuisance then we can ignore them entirely and just fit the non-inflated observations directly without any consideration of the inflation!"

Unfortunately, I think that by trying to model sigma with individuals that always scored zero, it becomes a problem to fit the model (I’m guessing that is your original problem) and you can’t just ignore the zero inflation if you want to model sigma in this case.

matti · October 27, 2022, 9:05am

This is an interesting problem; off the top of my head I’d recommend trying out the censored normal model (the “Tobit” model). you can use the resp_cens() addition term in brms to do that.

andrewheiss · October 27, 2022, 5:28pm

Ha I wish. The log likelihood is the only part I’m missing from my custom hurdle gaussian model: A guide to modeling outcomes that have lots of zeros with Bayesian hurdle lognormal and hurdle Gaussian regression models | Andrew Heiss

Mine works with posterior_predict and posterior_linpred/epred and all that, but not loo(), since there’s no log likelihood :(

simoncolumbus · November 2, 2022, 3:57pm

Thank you for all the suggestions. I have now implemented three models which converge without warnings: a (left-) censored normal (Tobit) model as suggested by @matti, a zero-one-inflated beta (zoib) model as suggested by @jd_c, and an ordered beta model using @saudiwin’s ordbetareg package as a frontend for brms.

Below are the posterior predictive checks for the three models:

The models differ quite a bit in their assumptions about the underlying data generation process and in the interpretation of the parameters. Compared to the zoib model, both ordered beta and censored normal have the advantage that they assume that extreme responses reflect the same latent variable as the (beta/normal) distributed responses. I think that’s plausible for slider data like this.

The censored normal and ordered beta models also work when including predictors, including of sigma/phi; I have not tested this for the zoib model.

Code; just in case somebody comes across this in the future…

# Zero-one-inflated beta
fitzoib <- data %>%
  mutate(outcome = outcome/100) %>%
  brm(bf(outcome ~ 1 + (1|ppno), 
         phi ~ 1 + (1|ppno),
         zoi ~ 1 + (1|ppno),
         coi ~ 1 + (1|ppno)),
      data = ., 
      family = zero_one_inflated_beta(),
      control = list(adapt_delta = 0.95),
      iter = 4000, chains = 4)

# Ordered beta
library(ordbetareg)

fitordbeta <- ordbetareg(
  formula = bf(outcome ~ (1|ID1|ppno), 
               phi ~ (1|ID1|ppno)),
  phi_reg = T,
  data = data,
  chains = 4, iter = 4000, refresh = 0)

# Left-censored normal
fitcens <- data %>%
  select(ppno, outcome) %>%
  mutate(censoring = case_when(outcome == 0 ~ "left",
                               TRUE ~ "none")) %>%
  brm(bf(outcome | cens(censoring) ~ 1 + (1|ID1|ppno), 
         sigma ~ 1 + (1|ID1|ppno)),
      data = ., 
      control = list(adapt_delta = 0.95),
      iter = 4000, chains = 4)

jd_c · November 2, 2022, 4:02pm

Cool. It would be better to show the posterior predictive checks using type='bars'. That would make it a bit easier to see, because with zero-inflation, you have a discrete outcome, so the density plots make it difficult to tell what is going on. From these plots the zoib looks better, but it’s hard to tell with density plots.

saudiwin · November 2, 2022, 4:10pm

Yes, actually, I just released a new version of the ordbetareg package that has a new function pp_check_ordbetareg that will produce continuous and discrete plots. These generally show a better fit (though I suspect I can improve the continuous plot even more). Suffice it to say it’s not easy to visualize a continuous-discrete distribution accurately. The same is true for the ZOIB although you’re on your own there.

Generally ordbetareg and ZOIB will have a similar posterior predictive density. Often the main differences arise in the number of parameters (ZOIB has a lot more) and interpretation/estimation of coefficients, which can vary between the models.

matti · November 4, 2022, 2:55pm

Cool!

The outcome data are some kind of slider scale responses right? I would recommend specifying censoring in both the left and right ends.

Note also that the pp_check() results for a censored model are for the latent variable, not the observed. So the plot here is quite misleading since it suggests the model is predicting impossible values, which it technically is not. I am not sure how to do this plot for the censored model, maybe you could e.g. censor the predicted values manually or something. Given that and adding censoring to the right too might make you consider this model ;)

simoncolumbus · November 11, 2022, 8:57am

Good point about pp_check(), I suspected something like that. These plots were just a rough-and-ready way to illustrate how the models behave, since I’m using this as a starting point for a more complex model, but I should figure out how to do that for the final model.

I did try to both left- and right-censor, but right-censoring causes an error. I’ve opened an issue on the brms Github: Right-censored data · Issue #1423 · paul-buerkner/brms · GitHub Paul suspects it might be an issue with Stan’s normal_lccdf, though. If anybody has any insights on that, I’d love to hear those, too…

Topic		Replies	Views
Mixed-effect Location Scale Model with BS variance model Modeling specification	1	1034	June 25, 2020
Mixed Effects Location-Scale Model in brms: No variation in chains from starting values/Chains not mixing Modeling specification , performance , brms	3	614	September 4, 2023
Mixed-effects location-scale models in brms brms	13	2602	April 1, 2021
Modeling population and group-level variance separately brms specification	15	2985	June 6, 2023
Subject-specific vs population-averaged effects in brms longitudinal bernoulli regression model brms fitting-issues	17	1379	May 26, 2020

Invariant cases in mixed-effects location scale models

Related topics