Very different results when using exgaussian vs shifted_lognormal families for RT data in brms

Hi,

I am using brms to estimate the difference in reaction times between two experimental conditions.
I observed that results are vastly different when I use familiy = exgaussian(link=identity) or familiy = shifted_lognormal(link=identity). (Both suggested for RT data in the brms manual)
For example using the exgaussian yields a bayes factor of >10 in favor of no difference while shifted_lognormal yields a bayes factor of 0.8 in favor of a difference.
Using exgaussian results in more divergent transtion warnings, though. However, posterior predictive checks look better for exgaussian (seem to be consistently shifted for shifted_lognormal).

Please find example data and minimal working example in the attached files.

rt_dat.csv (19.6 KB)
min.html (947.3 KB)

Any advice?

Best, Marius

1 Like

Looking at the conditional effeccts plots, They seem to have substantial overlap for both models:


Simple contrasts (just using posterior_predict with the newdata argument set to the df where all MotionCongruency rows are either ā€œcongruentā€ or ā€œincongruentā€ shows massive 0 overlap for both models as well


In both your hypothesis calls your CIs overlap with 0 and your estimation error is the same order of magniture as your estimated effect (I hope I am reading the hypothesis output right). I have never used Bayes factors so I canā€™t really comment about those.

My take-away for this would be that there is no relevant difference between the two. Mind you, I have no domain knowledge and might miss something obvious that has to be taken into account, so take that conclusion with a grain of salt.

ps. Thanks for providing data and code when asking the question

1 Like

Iā€™m not familiar with the exgaussian distribution, but could it be that in the one model you are estimating the difference in arithmetic means and in the other (i.e. the lognormal) you are working with difference in geometric means? These two types of differences in means between groups need not be the same, i.e. when the (level of unexplained) variance of the outcome within each group is wildly different

Thanks a lot for your answer.
this is data from our control condition for which we expect an effect congruent > incongruent (however, for this particular data it may actually be covered by a ceiling effect).
Concerning the bayes factors I guess it boils down to the shape of the posteriors produced by the models since I compute them by dividing the posterior by the prior at the value of 0 (savage-dickey). And they do indeed look different, for the exgaussian model they are very peaky:

mu_Grating_hypothesis-test_prior-posterior.pdf (10.6 KB)

while less so for the shifted_lognormal:

mu_Grating_hypothesis-test_prior-posterior.pdf (11.4 KB)

To me it seems, considering the rest of the data, that using the exgaussian favors no effect while the shifted_lognormal favors an effect.
This data comes from a preregistered project, for which we want to apply the rule to sample data until the respective bayes factors are at least 10 (as suggested by several journals), either in favor of the null or the alternative. So the practical implication is hugeā€¦:)

@LucC This sounds interesting. Would this explain that the posterior checks are slightly shifted to smaller values for the shifted_lognormal model?

Here is another example that suggests that using shifted_lognormal overestimates effects compared to using exgaussian. The former gives an estimate of -0.21 while the latter an estimate of -0.08. The actual differences grouped by run are [0.177, 0.049, 0.0985, 0.1085, 0.069] (see min.html). Mind that run is not included as a group-level factor here because nruns < 6.

rt_dat.csv (8.6 KB)
min.html (1016.0 KB)

So, my conclusion for now is to use the exgaussian version (?).

Note that the way the exgaussian is written can overflow for large values of lambda. Thereā€™s a solution in this issue Fix Exponentially modified Gaussian overflows at lambda >= 40 Ā· Issue #2803 Ā· stan-dev/math Ā· GitHub

@spinkney Thanks, I will check it out!

After checking the Wikipedia page on the exgaussian distribution Iā€™m now confident that my point about geometric vs. arithmetic mean stands. You cannot compare your two models base on just the linear predictors as they calculate two quite different summary metrics. You could however still translate the findings from the log-normal model to a difference in arithmetic means, using the identy \mu_{arithmetic} = exp(log(\mu_{geometric}) + \sigma_{lognorm}^2 / 2). Here, log(\mu_{geometric}) is essentially the linear predictor for your log-normal model (before shifting). Also, note that an estimate of the difference in means for the lognormal distribution is a difference on the log scale (i.e. the same as the log of the relative difference in geometric means on the natural scale). For the exgaussian model, Iā€™m not sure, but Iā€™m guessing the estimated difference is an absolute difference on the natural scale. So we may just be looking at apples and oranges here.

I cannot comment on your posterior checks as Iā€™m not sure what you are referring to.

1 Like

You can read up the exact parameterizations here: Parameterization of Response Distributions in brms

@LucC I donā€™t understand why this difference leads to such a profound difference for inference? I am not interest in a direct comparison between the two models, but I wonder why, taken separtely, they deliver conflicting inference? Or do I muss your point?
Regarding what I said about the posterior checks, I think this was nonsense.

@scholz thank you for the link!

Could you explain to me how the inference is different? My interpretation would be that both models find no difference in your first example and both find a difference in the second example.
If it is about the difference in size of the effect, this will most likely be due to the parameterizations of the likelihoods, as the mu parameters do not mean the same things.

@scholz in the second example, yes, both models find a difference. But in the first example my interpretaion is different. Here, the gaussian model suggests no difference (and applying the rule of sampling up to a bayes factor of 10 I would now stop), while the shifted lognormal model has a tendency towards an effect (bayes factor of ~0.8) essentialy indicating that I have to sample more data.

Something to note is that bayes factors are sensitive to prior choice iirc. So you would probably have to make sure that your priors are equivalent between the models.
Looking at the sample_prior = "only" pp_check() plots below, they are not only not equivalent but apparently put weight on values that are many orders of magnitude away from anything reasonable compared to your data.

Using a tighter prior (essentially using the estimates of the model as prior again, donā€™t do this kids)

priors_tight = c(
    set_prior('normal(-.85, 0.2)', class='Intercept'),
    set_prior('normal(-1, 0.2)', class='Intercept', dpar = "sigma"),
    set_prior('normal(-0.1, .01)', class='b'),
    set_prior('normal(-0.01, .1)', class='b', dpar = "sigma"),
    set_prior('normal(0.3, .2)', class='sd', lb = 0),
    set_prior('normal(0.3, .2)', class='sd', dpar = "sigma", lb = 0)
)

image

we can get the following bayes factors for the shifted lognormal model

                         Hypothesis   Estimate  Est.Error   CI.Lower  CI.Upper   Evid.Ratio    Post.Prob Star
1 (MotionCongruencyincongruent) = 0 -0.1017517 0.01011646 -0.1214539 -0.082501 5.906774e-36 5.906774e-36    *

I tried to do something similar to the exgaussian but it was still way too wide compared to the data and had a few 100s divergent transitions. But the Evid.Ratio reached 1e+40 ish

My interpretation would be that you might need to take a look at your priors before continuing to use bayes factors.

2 Likes

@scholz thanks a lot. I have some more questions:

In order to set a prior for the Intercept you have to modify the function to RT ~ 0 + Intercept + ..., right?

What is the relation between these prior predicitive checks and the priors as they are plot from the hypothesis (see my post above)? The latter look reasonable for both models.

Are those heavy tails realy problematic as long as/if the dominant mass is in a reasobale area?

I had chosen the prior based on suggestions here: Prior Choice Recommendations Ā· stan-dev/stan Wiki Ā· GitHub
I expect an effect size of something like \pm 0.1 centered at 0. Before, I was acutally using truncated normal distributions as priors to create flat distributions within the behviorally plausible boundaries until I read that this should not be done if the bounds do not represent true constraints.
Is my understanding correct that the priors I used so far are not suitable because RTs are not normally distributed, or is it more complex (I am a prettty naive user:))?

Looking at the parameterization of the shifted_lognormal model, I found the ndt paramter. Is it be reasonable to estimate the effect of my experimental conditions on this parameter? Something like:

mf_ <- bf(
    RT ~ 0 + Intercept, 
    ndt ~ 0 + MotionCongruency,
    family=shifted_lognormal()
) 

Or should RT also depend in MotionCongruency?

I am happy about any adivce which priors do make sense here.

In order to set a prior for the Intercept you have to modify the function to RT ~ 0 + Intercept + ..., right?

I donā€™t think so. I used your models as they were. I know that there is some difference between the (implicit) y ~ 1 + ... and y ~ 0 + Intercept + ... coding but I am not sure.

What is the relation between these prior predicitive checks and the priors as they are plot from the hypothesis (see my post above)? The latter look reasonable for both models.

So the prior you get from the hypothesis plot is the marginal prior for the parameter. That looks reasonable (although a N(0, 0.05) would represent your belief better). The problem is that your prior (in combination with the default choices for the rest) leads to unreasonable prior predictions for the outcome variable. This wouldnā€™t be a huge problem if you were looking at your posterior as you seem to have enough-ish data to overcome even such a wide prior without problems. But Bayes factors are sensitive to priors so you have to carefully think about all dimensions of it. Not sure I can help with that a ton. My approach would be to set all marginal priors that I explicitly have and then start tightening the default priors for everything else until the distribution on the outcome scale resembles the space where reaction times are reasonable (eg. up to 10s or whatever is a reasonable max for your experiment).

With how close your values are to 0, you could also just start with the usual continuous lower bounded likelihoods (lognormal/gamma/weibull/beta prime) as theyā€™ll be a bit easier to handle. Another option could be to transform your outcome to a smaller scale like milliseconds so you donā€™t have to tighten the prior so much.

Is my understanding correct that the priors I used so far are not suitable because RTs are not normally distributed, or is it more complex (I am a prettty naive user:))?

They might be normally distributed on the log scale (the assumption of the lognormal) so that would not be a problem in general. It is more about the explosion of variance your prior has through your model that leads to unreasonable expectations on the outcome scale.

Looking at the parameterization of the shifted_lognormal model, I found the ndt paramter. Is it be reasonable to estimate the effect of my experimental conditions on this parameter? Something like:

The ndt is the shift applied to the lognormal distribution. Depending on what you think your intervention influences, it might be reasonable to assume an effect on the shift itself.
Generally you could always fit multiple models and compare them via loo_compare(loo(m1), loo(m2)) to see which one has better out-of-sample predictive performance.
Assuming your model is causally consistent with your experiment the best predicting model should also give you the best parameter recovery.

I thought the bayes factors depend on the marginal prior for the paramter? Isnā€™t the savage dickey density ration the ration between between this prior and the posterior?

What if I compute the bayes factor in a different way? Specifying a dedicated null model and then using this against the full model. Does the prior have the same role here?

I have never used Bayes factors so I canā€™t really comment about those.

Sorry, I have no idea.

1 Like

I could now achieve reasonable prior and posterior predictions using both models, shifted_lognormal and exgaussian using reasonable priors for the effect of interest (for a simplified model). For the exgaussian, it seems that as @scholz suggested, you have to have the RTs on ms scale. Using the same model just scaling everything (incl. the priors) back to s yields prior predicitions far off.
I get some weight on negative values for the prior predicition of the gaussian model not reflected in the data, though (is it possible to get rid of this by a hard lower bound?).

HOWEVER, I still end up with contradictory inference when I cmpute bayes factors: the shifted_lognormal suggests an effect (BF_{01} = 0.38) while the exgaussian suggests no effect (BF_{01} = 7.74).

Please find all plots and code here:
mwe.html (761.0 KB)

Data in for this example is the same:
rt_dat.csv (19.6 KB)

I think I found the reason for the seemingly diverging inference when I compute bayes factors from the two models. It is the prior for the effect of interest that if kept constant can produces different outcomes for the two models. For example

priors_ = c(
    set_prior('normal(log(.4), 0.05)', class='b', coef='Intercept'),
    set_prior('normal(0, 0.05)', class='b', coef='MotionCongruencyincongruent'),
    set_prior('normal(.3, 0.15)', class='sigma'),
    set_prior('normal(0.3, .2)', class='sd')
)

mf_ <- bf(
    RT ~ 0 + Intercept + MotionCongruency + (MotionCongruency || Run)#, sigma ~ MotionCongruency + (MotionCongruency | Run)
) 

hyp_string <- 'MotionCongruencyincongruent = 0'

brms_mm = brm(
    mf_,
    data = rt_dat,
    prior = priors_,
    sample_prior = TRUE,
    family = shifted_lognormal(link = "identity"),
    save_pars = save_pars(all=TRUE),
    control = list(adapt_delta = 0.9, max_treedepth = 15))

preduces a BF_{01} = 0.44

while the same prios, just scaled to ms, in the exgaussian model:

mf_ <- bf(
    RT ~ 0 + Intercept + MotionCongruency + (MotionCongruency || Run)
) 

priors_ = c(
    set_prior('normal(400, 50)', class='b', coef='Intercept'),
    set_prior('normal(0, 50)', class='b', coef='MotionCongruencyincongruent'),
    set_prior('normal(300, 150)', class='sigma'),
    set_prior('normal(300, 200)', class='sd')
)

hyp_string <- 'MotionCongruencyincongruent = 0'

rt_dat_ms <- rt_dat
rt_dat_ms$RT <- rt_dat_ms$RT * 1000

brms_mm = brm(
    mf_,
    data = rt_dat_ms,
    prior = priors_,
    sample_prior = TRUE,
    family = exgaussian(link = "identity"),
    save_pars = save_pars(all=TRUE),
    control = list(adapt_delta = 0.9, max_treedepth = 15)
)

preduces a BF_{01} = 5.58.

Both versions, I think, are reasonable regarding my expectations, or do I miss something?
It seems that for both models I can adjust the prior so that it mimics the respective other model. For example setting the prior of the shifted_lognormal model to set_prior('normal(0, 1.5)', class='b', coef='MotionCongruencyincongruent') yields BF_{01} = 7.58 (however, then the prior predicitions are far off again).

Do I missinterpret the sigma values of the priors?