Shifted log normal distribution with varying shifts by participant

Hi,

I am trying to fit a mixed effects model with a shifted lognormal distribution, but want to have a different shift for every participant. I am thinking about it sort of like a random intercept, but not sure if that is the right way to think about it. Any ideas on how I might go about specifying such a model?

Thank you,
Grusha.

2 Likes

You can use the brms distributional framework for this purpose as explaine in the brms_distreg vignette.

For instance, you could use

formula = bf(y ~ x + (1 | person), ndt ~ (1 | person))

where ndt (non-decision time) is the shift parameter of the shifted_lognormal family.

4 Likes

I have a couple of follow up questions about this (some technical and some conceptual) that are slightly unrelated to each other. I am not sure if this is the right place to post these questions. Please let me know if you think I should make separate questions and I will!

Here is the formula I want to use:

formula1 <- bf(rt ~ sentence_type*scale(crit_num) + scale(log(sent_num)) + (1+ sentence_type*c.(crit_num) | participant) + (1+ sentence_type*c.(crit_num) | sent_id), family = shifted_lognormal(), ndt ~ (1 | participant))

  1. In order to set a prior for the ndt parameter, would I do something like set_prior(“normal(0,1)”, class = “ndt”)

  2. How is the random intercept for ndt related to the random intercept for participant? For example, you might imagine that a participant with a larger intercept (i.e. greater mean RT), might also have a larger shift parameter (i.e. greater floor). Would such a dependency be captured? Or will they not since the random effects don’t estimate means for individual participants but rather just a distribution of means?

  3. Since I am assuming a lognormal distribution, do I need to specify the priors in RT scale or log(RT) scale? E.g. if I wanted to specify a prior that wanted to capture the knowledge that on average RTs were normally distributed with mean 400 and sd 300, would I set the prior for the intercept to normal(400,300) or would I set it to normal(2.6, 2.4) ?

  4. The prior I mentioned in the previous question is not actually the kind of prior knowledge that is relevant. Ideally I would want to assume that RTs were lognormally distributed. Is there a way to specify that?

  5. If I specify the priors for a few of the parameters (fixed or random) but not the others, then will the parameters for which I didn’t specify anything assume the prior described here: https://github.com/paul-buerkner/brms/issues/131 ?

  1. Yes, but I don’t know if the prior is reasonable for your data. To get more info on what to set priors, use get_prior.
  2. Yes, you can use brms’s ID syntax for it and replace (1 | participant) by (1 | ID | participant), where you can replace ID by whatever you like. See the second brms paper as published in R journal for detials.
  3. the lognormal distribution has its parameters on the log scale that is, the priors are in fact on the scale of log(RT)
  4. use lognormal instead of normal if you want to use a lognormal prior. I would argue this is non-sensible here, since you model the log(mean(RT)) and not mean(RT) as per (3).
  5. See the output of get_prior for default priors.
1 Like

Thank you! This is very helpful. I have two more clarification questions to make sure I understand how the lognormal distribution works. Lets say I have the following formula where X_1 is a continuous predictor and specify a lognormal data distribution:

RT = Beta_0 + Beta_1*X_1  + error

In this case, if Beta_1 = 0.1 according to the MAP estimate, we can interpret that with 1 unit increase in X_1, there is a a 0.1 increase in log(RT) or a 1.1 ms increase in RTs right? Also if my (weak) prior beliefs were that X_1 leads to a an increase between 0.5 ms and 10 ms, I would specify a prior like norm(1.4, 0.4) right?

Another alternative would have been to have the following formula instead and specify a lognormal data distribution.
log(RT) = Beta_0 + Beta_1*X_1 + error

In this case, if Beta_1 = 0.1, we would have a similar interpretation as earlier. However, the model assumptions are different, because in this case we are assuming that the predictors are multiplicative on the raw scale right? So is the only difference between using a lognormal distribution and modeling log(RTs) whether or not the predictors and the error term are additive/ multiplicative? Or are there other factors I need to keep in mind? (I have always modeled log(RTs) in LMER models and this is the first time I am shifting to Bayesian models with other data distributions, so I just want to be doubly sure)

brm(rt ~ x, family = lognormal()) and brm(log(rt) ~ x, family = gaussian()) are equivalent by definition of the lognormal distribution. However, I prefer the former approach as it yields predictions etc. directly on the scale of interest, which is the RTs themselves.

2 Likes

Hi,

is it possible to apply truncation on the response in this distributional framework?

Such as :

formula = bf(y | trunc(lb = 0, ub = 1) ~ x + (1 | person), ndt ~ (1 | person))

What happens if you run the model?

based on a prior predictive check using pp_check(brms_fit), it does not look like it truncates properly under certain circumstances which I will describe below.

First, here is the code used to make the model and run the prior predictive check:

fit0 <- brm(formula = bf(formula = reaction_time | trunc(ub = 500) ~ 1, ndt ~ 1 + bigram_ideal_first_surprisal + block_num ),
  data = experiment_df,
  family = shifted_lognormal(),
  prior = c(
    set_prior("normal(-2,2)", class = "Intercept"),
    set_prior("normal(5.5,0.01)", class = "Intercept", dpar = "ndt"),
    set_prior("normal(0,0.01)", class = "b", coef = "bigram_ideal_first_surprisal", dpar = "ndt"),
        set_prior("normal(-0.05,0.001)", class = "b", coef = "block_num", dpar = "ndt")


    ),
  sample_prior = "only"
)

pp_check(fit0)

Interestingly, when make the priors on the inputs to ndt very small, I get a clear truncation at 500 in the pp_check() output, but when the priors on the inputs to ndt are big, then truncation fails.

Perhaps this suggests that truncation fails when the parameter ndt is too large? Is this a valid way to check if truncation is working?

I think I do see the point now. I think I will have to deactivate truncation for this model for the time being.

With this same model, I plotted the posterior prediction after fitting to data and I got truncation, so it seems that maybe this just happens when ndt is super big which could happen for my priors but did not for the posteriors.

Still, with the shifted parameterization, truncation may not work as expected. Would you mind opening an issue on github?

will do, thanks!