Understanding shifted lognormal parameters and priors

  • brms Version: 2.10.0

Hi,

This is my first foray into non-normal modelling, so please bear with me if I’m asking obvious questions. I have searched the forum and looked at various documentations, but I remain unclear on a couple of things.

I have data which can be considered long reaction times (between about 20 and 100 seconds). The shape of each participant’s data follows more or a less a shifted lognormal distribution. The design of the experiment is such that there are two within-participant factors with two levels each, resulting in four conditions: A1B1, A2B1, A1B2, A2B2.

I would like to test the effect of condition on the shift parameter and mean of the shifted lognormal. I have defined the model as the following:


formula <- bf(RT ~ 0 + intercept + A + B + (1 | ID),
   ndt ~  0 + intercept + A + B + (1 | ID))

priors <- c(set_prior("normal(0,20)", class = "b", coef = "intercept"),
            set_prior("normal(0,20)", class = "b", coef = "groupunfair"),
            set_prior("normal(0,20)", class = "b", coef = "partnerunfair"),
            set_prior("cauchy(0,1)", class = "sd"),
            set_prior("normal(50,20)", class = "b", coef = "intercept", dpar = "ndt"),
            set_prior("normal(0,50)", class = "b", coef = "groupunfair", dpar = "ndt"),
            set_prior("normal(0,50)", class = "b", coef = "partnerunfair", dpar = "ndt")
            )

model <- brm(formula, family = shifted_lognormal(),
              prior = priors,
              data  = dat, warmup = 2000,
              iter  = 10000, chains = 3, 
              control = list(adapt_delta = 0.97),
              cores = 4, inits = "0",
              sample_prior = TRUE,
              save_all_pars = TRUE)

The model seemed to converge, with well-mixed chains etc, and the posterior predictive check fits the data surprisingly well. However, I have some questions about the defining the priors and understanding the output.

  1. As you can see, I defined wide priors on the RT scale. However, in this post, Paul seems to suggest that priors of the lognormal should be defined on the log scale. When I tried to do that, I received an error message stating that no samples could be drawn (annoyingly I seem to have deleted the code where I defined these priors - sorry, bad practice, I know).

  2. Here is the output of the model (I’ve altered the predictor names to match the definitions above for ease):

Family: shifted_lognormal 
  Links: mu = identity; sigma = identity; ndt = log 
Formula: LT ~ 0 + intercept + A + B + (1 | ID) 
         ndt ~ 0 + intercept + A + B + (1 | ID)
   Data: dat (Number of observations: 1296) 
Samples: 3 chains, each with iter = 10000; warmup = 2000; thin = 1;
         total post-warmup samples = 24000

Group-Level Effects: 
~ID (Number of levels: 25) 
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)         0.67      0.10     0.50     0.90 1.00     4570     8184
sd(ndt_Intercept)     2.67      0.74     1.63     4.48 1.00     5916     8951

Population-Level Effects: 
                          Estimate  Est.Error   l-95% CI   u-95% CI Rhat Bulk_ESS Tail_ESS
intercept                 3.38       0.14        3.10         3.66         1.00     3057     6018
A2                       -0.05       0.06       -0.16         0.06         1.00    11896    15970
B2                       -0.16       0.05       -0.26        -0.07         1.00    14393    15952
ndt_intercept             0.13       0.74       -1.58         1.34         1.00     4574     7373
ndt_A2                   -0.31       0.07       -0.44        -0.16         1.00    12563    13073
ndt_B2                    0.47       0.11        0.25         0.66         1.00    10483    13871

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.63      0.02     0.60     0.67 1.00    14286    17161

My understanding was that the estimates should be on the scale of the outcome variable (seconds), yet clearly here this is not the case (the mean RT in the data is 34s). Am I right in assuming the estimates here are in log(RT)? If so, this makes sense for the estimate of the intercept (exp(3.38 = 29.37), but I’m confused about rest of the estimates. When I run a mixed model in lme4::lmer, the estimates are about a 3 and 9 second difference for the different parameters. I know this a different kettle of fish completely given the fact it’s a different model and approach, but these tiny effects are baffling me a bit.

I fully expect that it is to do with a complete non-understanding of shifted lognormal parameters or how to define the model/priors correctly, so would really appreciate a hand in understanding what’s going on.

Thanks in advance,
Anthony

3 Likes

Hey Anthony! Welcome to the Stan forum! Sorry it took us so long to respond.

Disclaimer: I rarely work with brms, and never used the shifted log-normal (is ndt the “shift-model”?). Generally, the estimates of the Log-Normal are on the log-scale.

I think this means that you should treat the priors as if they’re for log-scale coefficients. You kind of already do this (allowing for huge negative intercepts for example), but the scales are pretty off. For example, your prior in the intercept (let’s ignore everything else) would imply a prior predictive distribution that looks like hist(exp(rnorm(1e5, 0, 20))).

Yes.

My advice would be to run posterior predictive checks to see where the predictions actually fall. A Log-Normal model can be hard to interpret from it’s coefficients alone and this is especially true for the “interaction” with the shift.

Hope this helps…
Cheers! :)
Max

Hi Max,

Thanks for your reply! I hope you don’t mind if I reply with some follow up questions/comments.

Yes you are right, ndt is the shift parameter. Your message prompted me to look deeper into the brms families documentation and code, and this is what I found:

shifted_lognormal <- function(link = "identity", link_sigma = "log",
                              link_ndt = "log") {
  slink <- substitute(link)
  .brmsfamily("shifted_lognormal", link = link, slink = slink,
              link_sigma = link_sigma, link_ndt = link_ndt)
}

So my understanding is that the location prior should be defined on the RT scale, and sigma and ndt priors should be defined on the log(RT) scale. Would you agree with this assessment? If that is the case, then should my prior on the ndt intercept actually be "normal(log(50), log(20))"?

If so, does that mean that actually the estimates would be returned on the same scales as the priors? If so, then I am really confused by the returned parameter estimates, because now the intercept would actually be 3.38 seconds and the ndt_intercept is exp(0.13). These added together equal about 4.5 seconds for the mean RT, which is far from the actual mean of the data.

When I run a posterior predictive check with pp_check() on the model, I get the output below. This seems very reasonable to me (ignore the spike at the end, there is a max possible time of 105 seconds), but doesn’t seem to match up with the parameter estimates, unless I’m totally missing something.

I had the thought of defining the ndt link as “identity” in the call:

brm(formula, family = shifted_lognormal(link_ndt = "identity")...)

My thinking was that then the priors for both the location and shift parameters would be on the right scale and the estimates would be returned on the RT scale. The model summary is shown below. Now there is large negative ndt_intercept which baffles me a bit, considering how positive the prior is, and how positive the data is!

Family: shifted_lognormal 
Links: mu = identity; sigma = identity; ndt = identity 
Formula: LT ~ 0 + intercept + group + partner + (1 | p | ID) 
               ndt ~ 0 + intercept + partner + group + (1 | p | ID)
Data: dat (Number of observations: 1296) 
Samples: 3 chains, each with iter = 10000; warmup = 2000; thin = 1;
         total post-warmup samples = 24000

Group-Level Effects: 
~ID (Number of levels: 25) 
                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                  0.70      0.10     0.53     0.93 1.00     6592    10589
sd(ndt_Intercept)              27.24     5.55    18.36    39.95 1.00     6843    11459
cor(Intercept,ndt_Intercept)  -0.84      0.07    -0.94    -0.66 1.00     9640    13448

Population-Level Effects: 
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
intercept           3.96      0.15     3.65     4.25 1.00     4774     9136
A2                  0.11      0.03     0.05     0.17 1.00    20313    17169
B2                 -0.15      0.03    -0.22    -0.09 1.00    17921    17427
ndt_intercept      -23.00      6.87   -37.21   -10.10 1.00     6760    11382
ndt_A2              0.10      0.69    -1.26     1.48 1.00    22880    19127
ndt_B2             -1.99      0.69    -3.31    -0.60 1.00    24596    18416

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.27      0.02     0.22     0.31 1.00    14256    15616

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

And yet the posterior predictive check still has RTs in the positive range:

Any further thoughts you (or anyone else) may have on this would be greatly appreciated, or a nudge in the right direction for where I should be looking to better understand these things.

How would you interpret the coefficients then? Would you just look at the HDIs to assess whether there is an effect and then not try to actually interpret the coefficients?

Thanks again,
Anthony

Hey!

Looking at the actual Stan code generated by make_stancode shows that the shifted Log-Normal is parameterized like lognormal_lpdf(Y - ndt | mu, sigma), so, ignoring Jacobians for a moment, this means \log(Y - \text{ndt}) \sim \text{Normal}(\mu, \sigma). With link identity for mu, the coefficients are on the log scale! I know this is a bit confusing – at least for me it was in the beginning. Having a log-link for ndt implies (again, ignoring Jacobian corrections) \log(Y - \exp(\text{ndt})) \sim \text{Normal}(\mu, \sigma), so the ndt coefficients are then also on the log scale as well. You could work out the interpretation for each coefficient, but I’d say the coefficients are not the interesting part of the model – this is especially true for such such non-linear models, or any non-trivial model really. Btw, exp-ing coefficients wont give you the conditional mean of Y, but of its logarithm! Taking the \mu of the above formula, \exp(\mu) is actually an estimate of the (conditional) median of the outcome… and the outcome has ndt in it… well, I’d suggest to look at what group comparisons are meaningful to your analysis an then generate posterior predictions for these outcomes and compare those. This is bit in the spirit of marginal effects (at representative values) which are widely used in frequentist stats as well.

brms is great for that, too! You could try adding truncation to the model… something like trunc(ub = 105), I think. So

bf(RT | trunc(ub = 105) ~ 0 + intercept + A + B + (1 | ID),
   ndt ~  0 + intercept + A + B + (1 | ID))

but I’m not very sure of this one, because I don’t regularly work with brms (although it’s great).

Cheers!
Max

2 Likes

Hi Max,

This makes so much sense. Thanks for the clear explanation, I really appreciate it.

Cheers,
Anthony

1 Like

A post was split to a new topic: Weakly informative priors for Shifted LogNormal

Hi,

I have read this thread and the thread that is linked to this one about the shifted lognormal model for which I am also doing a prior predictive check. However, after reading both threads I still don’t understand how the model works.

I have now simplified my model and excluded random effects to just focus on getting the priors for mu, sigma and ndt reasonable, but eventually I am planning to include random effects.

I have now set the following priors:

prior <- c(prior("normal(5.7, 0.4)", class = "Intercept", coef = ""),
       prior("normal(1,0.30)", class = "b", coef = ""),
       prior("normal(1.6, 0.1)", class = "Intercept", coef = "", dpar = "sigma"),
       prior("normal(1, 0.10)", class = "b", dpar = "sigma"),
       prior("normal(0.80, 0.10)", class = "ndt", coef = ""))

I based these priors on a post on another thread that explained that you needed to exponentiate the values for the intercept to get them on the RT scale and that you needed to exponentiate the exponent of the values of sigma and ndt to get them on the RT scale.

But, the parameter estimates that are returned do not fully match with the mean of these prior distributions.

I run this code to obtain the prior parameter estimates:

ModelFit ← brm(bf(RT ~ BlockType, sigma ~ BlockType),
iter = 2000, chains = 4, cores = 4,
data = Subset, family = shifted_lognormal(), sample_prior = “only”, prior = prior)

The point estimate of the intercept is 4.95 and the point estimate for the intercept of sigma is 0.87. The point estimate of ndt is 0.80.

So the point estimate of the intercept is different from the mean of the normal prior that I set for the intercept. The point estimate of the intercept of sigma is also different from the mean of the prior that I set. However, the point estimate of ndt is exactly the same as the mean of the prior.

I defined my priors in the log scale and thought that the estimates were also in the log scale, so why is there then still a difference between the mean of the prior and the estimate for the intercept and sigma?

As a result, my prior predictive distribution places more density over smaller RTs than what I wanted to achieve with my priors.

So how do the parameters relate to the priors?

Hey @Max_Mantei

Are you sure that truncation works in distributional models? I have a similar model to this, and based on my prior predictive distributions, truncation does not work for distributional models.

Hi @Ross_Kempner! Sorry for the delayed response. I think truncation should work also for distributional models. I’m not sure if prior predictive checks are implemented in BRMS, but I’m pretty sure it’d be possible in Stan. Check out the the relevant section in the user manual.

Cheers,
Max

Sorry that I didn’t see you post earlier. I’m actually not sure how to answer. I think it might be helpful (albeit tedious) to look at the generated Stan code and try to reverse engineer it in R.

Another point you might have to consider is that looking at point estimates/summaries of expectations can be misleading when taking logs/exp-ing due to Jensen’s inequality. (But maybe that’s not the issue here.)

Sorry, but I don’t think I can provide more help on this right now. Maybe someone else will jump in (now that the thread is bumped to the top again).

Cheers,
Max