Difficulties setting good priors for ex-gaussian hierarchical model using brms

Dear Stan community,

I have been trying to set priors and do a prior predictive check for an ex-gaussian regression model using brms v2.7.

The ex-gaussian regression model that I use is a hierarchical model in which RTs are predicted by two within-subject conditions with sigma and beta of the exgaussian distribution as constant parameters.

The model in brms:

>     M1 <- brm(RT ~ VisualSync * SyncTarget + (VisualSync*SyncTarget|participantID),
>               data = StimComp[StimComp$Detected == 1,], family = exgaussian(), prior = prior,
>               chains = 4, iter = 4000, cores = 4, save_all_pars = T, sample_prior = "only")

I have set the following priors:

> prior <- c(prior("normal(0,.1)", class = "b", coef = ""),
>            prior("lognormal(log(0.6), log(1.7))", class = "Intercept", coef = ""),
>            prior("lognormal(log(0.06), log(1.2))", class = "sigma", coef = ""),
>            prior("gamma(2.5,0.5)", class = "beta", coef = ""),
>            prior("lkj(1)", class = "cor"),
>            prior("student_t(3, 0, 0.02)", class = "sd"))

When I then use posterior predictive checks the prior distribution is symmetrical around 0, while the distribution of the fitted values looks like an ex-gaussian with RTs > 0 and a clear skew.

So, I assume that the residual error is too large and therefore introduces a distribution of RTs that allows for negative RTs.

I tried to allow less variance and more variance for the prior of the random effects (i.e., class “sd”), but this did not change the posterior predictive distribution much. It still remained symmetrical around zero.

So I don’t know how to troubleshoot this, as I don’t understand where the large standard errors may be coming from.

I saw that there was quite a lot of uncertainty in the estimate of the bèta parameter of the exgaussian. Does the prior for bèta allow to many values?

Any tips would be much appreciated!

Kind regards,
Hanne

Sorry, it looks your question fell through. Did you manage to resolve this?

If I understand it correctly your biggest worry is that the model allows for negative RTs? If so, then maybe the exgaussian is not the best response family and you may want to use a distribution that allows only positive values (gamma / lognormal)?

Best of luck with your model!

No I haven’t found a solution yet.

I cannot easily switch model families, because I need to estimate the exponential rate of the ex-gaussian distribution as it has been a long tradition in that research field to look at that parameter. So I would break too much with previous work if I switch families.

Is there any way to set the priors so that negative RTs are excluded from the predictions?

I am assuming that the negative RTs are coming from the large standard errors, because the fitted values do look good (no negative RTs and ex-gaussian shaped distribution), but the predicted values (including the standard errror) don’t look good (look like a symmetric gaussian including negative RTs).

Any advice is much appreciated!

This sounds a bit weird and my guess is you might have some small bug somewhere in your code. I don’t think I can help without much more information. Could you share the dataset and complete code you use to fit the model and do predictions? If you don’t want to/can’t share the data (which is totally understandable), could you still share at least the complete code, some plot of the predictions and summary of the fitted model?

Thanks for your help! I have now included the data and the script.

SART.csv (2.8 MB) Exp 4 Sustained Attention Revised Analyses.R (9.3 KB)

The fitted values that I obtain from sampling from the prior distributions only look like this:

Although there are some negative RTs, the distribution overall looks good (skewed and reasonable RT values).

However, when I look at the posterior predictions, I get the following:

The skew is gone and I get values that don’t make sense. (I zoomed in on the distributions here).

So I was wondering how to change my priors to improve this.

Thanks!

I noticed that the estimated bèta did not correspond to the mean of the prior gamma distribution that I had set for bèta, which I thought was odd. So I decided to try a different prior distribution for bèta. I changed the gamma distribution to a lognormal distribution.

So the priors I tried now are as follows:

prior2 ← c(prior(“normal(0,.1)”, class = “b”, coef = “”),
prior(“lognormal(log(0.3), log(1.7))”, class = “Intercept”, coef = “”),
prior(“lognormal(log(0.06), log(1.2))”, class = “Intercept”, coef = “”, dpar = “sigma”),
prior(“normal(0, 0.01)”, class = “b”, dpar = “sigma”),
prior(“lognormal(log(0.4), log(1.5))”, class = “Intercept”, coef = “”, dpar = “beta”),
prior(“normal(0, 0.25)”, class = “b”, dpar = “beta”),
prior(“lkj(1)”, class = “cor”),
prior(“lognormal(log(0.02), log(1.2))”, class = “sd”))

The Prior Predictive distributions have already improved compared to the previous version, but I’m not entirely happy yet:

I also don’t really understand why the gamma prior had this effect. Any more suggestions or tips?

Kind regards,
Hanne

I was an RT researcher in a past life (indeed, looks like we overlapped considerably in our domains; I worked with Gail Eskes and Ray Klein on attention) and I’d say this is a little strong. While there were some tentative computational theories of RTs tied to the ex-Gaussian, they were pretty weak, and certainly more nuanced models such as LBA and Diffusion are much more standard as computational models. More commonly the ex-Gaussian was used as a descriptive model in scenarios where folks recognized that RT data contained useful information in the shift, scale, and shape of the raw RTs (c.f. traditional practice of collapsing to a mean), but didn’t want to go to the bother of fitting a full process model like LBA/Diffusion (also, tools to fit the latter were fairly limited until recently, requiring hundreds of trials per condition to achieve stable/accurate results). But even as a descriptive model, there are notable alternatives extant in the literature, including log-normal, inverse-normal, and Weibull. You should check out the Weibull; in addition to having more descriptive flexibility than the others I mention (being a 3-parameter distribution, like the ex-Gaussian), there’s already work showing it’s usefulness in a hierarchical model (which I also strongly recommend you deploy) .

BTW here’s a talk I gave once on a dream of full hierarchical process models with modeled contamination.

1 Like

I am aware of the fact that there are better models out there to analyze RTs. However, in this specific study I built on literature of a specific task (sustained attention to response task) that is often used in the ADHD literature. In that literature they have found good discrimination between healthy controls and ADHD patients using the exgaussian family and the exponential rate parameter. So that’s why I wanted to work with the exgaussian family. I know the literature about the limited interpretability of the exgaussian parameters, but my predictions were based on the ADHD literature and called for an exgaussian family. So, I would still like some help finding a way to constrain the priors so that negative RTs are cancelled out from the prior distribution.

Gotcha, and sorry to hijack your query on troubleshooting your model with the less-than-optimally-helpful suggestion that you try a completely different distribution. I’m not super familiar with brms still, being more a raw Stan fan, but I’ll take a look tonight to see if I can pin down what’s going awry.

Ah, I know that one! :) Leth-Steensen 2000, right? I talked about it briefly in a pro-ex-Gaussian talk I did once whose title I’m still proud of (though I was unaware Gould beat me to that one).

Thanks! It was a paper by Tarantino et al. (2013), but good to see that others found similar things :).

Ok, I took a look and this may be my brms-naivety showing but I think something might be wrong with brms::posterior_predict. @paul.buerkner Could you take a look at this? @HanneHuygelier provided data and well-commented code above but to simplify things I took out the hierarchy in Exp 4 Sustained Attention Revised Analyses (simpler).R (9.1 KB) .

I tried to dig into this a bit deeper so that @paul.buerkner doesn’t have to and I think I understand what is going on:

  • The predictor in brms predicts the mean of the exgaussian, not the mean of the gaussian component, so taking the wikiepdia notation for the distribution parameters (which is the same as in Stan), the main predictor of the model (let’s call it \bar\mu) is defined as \bar\mu = \mu + \beta where \lambda = 1/\beta.
  • The model is quite certain about the main predictor \bar\mu, which is what you see as the result of posterior_epred
  • The model is also quite certain about \sigma, but it is uncertain about \beta which leads to a large uncertainty about \mu, and if \beta is large but \bar\mu small (as you see in the plots from posterior_epred), it means that \mu = \bar\mu - \beta is hugely negative.
  • With hugely negative \mu, small \sigma and large \beta, the resulting draw from the exponentially modified gaussian is basically something_very_negative + rexp(shape = large_number) which I think is what generates those extreme tails in your predicted plot.

To me this looks like the model is behaving as designed, but the design seems weird. Not sure what the standards of the field are, but I guess that @HanneHuygelier expected the main predictor to model \mu not \bar\mu (this can still be achieved via a custom brms family, if that’s what you want).

EDIT: This is exactly what was discussed at Modelling mean of gaussian component in brms version 2.3.0 I obviously could have saved myself some trouble by reading that first :-D.

3 Likes

Ah yes thanks! That sounds like a logical explanation. I was indeed confused about the meaning of mu, because it makes more sense to me to formulate the effect on mu of the gaussian rather than the effect on the mean of the total exgaussian distribution. I hope that I can figure out working with the custom families. Otherwise, given these insights, would it make sense to make the prior for β more specific (less uncertain) or does that not make any sense?

I have no real experience with analyzing reaction time, but I would guess that the two components (exponential and guassian) are there to represent distinct parts of the reaction process and the associated variability and so it makes sense to me to model them separately, without the implicit correlation introduced by using \bar\mu. I think a prior that would constrain \beta enough in this case would likely need to be quite informative (i.e. narrow around some specific value) and thus likely indefensible.

1 Like

Sounds reasonable indeed! Thanks for the help!

Thanks @martinmodrak for digging into this issue.

Just one minor thing: There is some literature arguing that these two model components don’t represent two sensible distinct cognitive components, hence my choice to model the overall mean as the main parameter for consistency.

1 Like

I certainly don’t want to override topic experts, but the fact that it is suggested for reaction times, yet you can get extremely negative predictions (although here because the model is underdetermined by data) is IMHO a bit worrying… Maybe there is some space for even better parametrization, e.g. mean, sd (of the whole sum) and \sigma or even mean, sd and the proportion of variance attributable to the normal component…

But that’s a bit digressing from the original topic :-)

1 Like

I agree. The advantage of this distribution is certainly its easy(ish) interpretation as we can get away with an identity link at the cost of having no theoretical lower bound at zero. However, for most response times data, we will usually not come close to zero so empirically this distribution works well. I am happy to discuss different parameterizations but I am not sure this will save us from negative predictions under certain data regimes.

2 Likes