Understanding shifted lognormal parameters and priors

  • brms Version: 2.10.0

Hi,

This is my first foray into non-normal modelling, so please bear with me if I’m asking obvious questions. I have searched the forum and looked at various documentations, but I remain unclear on a couple of things.

I have data which can be considered long reaction times (between about 20 and 100 seconds). The shape of each participant’s data follows more or a less a shifted lognormal distribution. The design of the experiment is such that there are two within-participant factors with two levels each, resulting in four conditions: A1B1, A2B1, A1B2, A2B2.

I would like to test the effect of condition on the shift parameter and mean of the shifted lognormal. I have defined the model as the following:


formula <- bf(RT ~ 0 + intercept + A + B + (1 | ID),
   ndt ~  0 + intercept + A + B + (1 | ID))

priors <- c(set_prior("normal(0,20)", class = "b", coef = "intercept"),
            set_prior("normal(0,20)", class = "b", coef = "groupunfair"),
            set_prior("normal(0,20)", class = "b", coef = "partnerunfair"),
            set_prior("cauchy(0,1)", class = "sd"),
            set_prior("normal(50,20)", class = "b", coef = "intercept", dpar = "ndt"),
            set_prior("normal(0,50)", class = "b", coef = "groupunfair", dpar = "ndt"),
            set_prior("normal(0,50)", class = "b", coef = "partnerunfair", dpar = "ndt")
            )

model <- brm(formula, family = shifted_lognormal(),
              prior = priors,
              data  = dat, warmup = 2000,
              iter  = 10000, chains = 3, 
              control = list(adapt_delta = 0.97),
              cores = 4, inits = "0",
              sample_prior = TRUE,
              save_all_pars = TRUE)

The model seemed to converge, with well-mixed chains etc, and the posterior predictive check fits the data surprisingly well. However, I have some questions about the defining the priors and understanding the output.

  1. As you can see, I defined wide priors on the RT scale. However, in this post, Paul seems to suggest that priors of the lognormal should be defined on the log scale. When I tried to do that, I received an error message stating that no samples could be drawn (annoyingly I seem to have deleted the code where I defined these priors - sorry, bad practice, I know).

  2. Here is the output of the model (I’ve altered the predictor names to match the definitions above for ease):

Family: shifted_lognormal 
  Links: mu = identity; sigma = identity; ndt = log 
Formula: LT ~ 0 + intercept + A + B + (1 | ID) 
         ndt ~ 0 + intercept + A + B + (1 | ID)
   Data: dat (Number of observations: 1296) 
Samples: 3 chains, each with iter = 10000; warmup = 2000; thin = 1;
         total post-warmup samples = 24000

Group-Level Effects: 
~ID (Number of levels: 25) 
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)         0.67      0.10     0.50     0.90 1.00     4570     8184
sd(ndt_Intercept)     2.67      0.74     1.63     4.48 1.00     5916     8951

Population-Level Effects: 
                          Estimate  Est.Error   l-95% CI   u-95% CI Rhat Bulk_ESS Tail_ESS
intercept                 3.38       0.14        3.10         3.66         1.00     3057     6018
A2                       -0.05       0.06       -0.16         0.06         1.00    11896    15970
B2                       -0.16       0.05       -0.26        -0.07         1.00    14393    15952
ndt_intercept             0.13       0.74       -1.58         1.34         1.00     4574     7373
ndt_A2                   -0.31       0.07       -0.44        -0.16         1.00    12563    13073
ndt_B2                    0.47       0.11        0.25         0.66         1.00    10483    13871

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.63      0.02     0.60     0.67 1.00    14286    17161

My understanding was that the estimates should be on the scale of the outcome variable (seconds), yet clearly here this is not the case (the mean RT in the data is 34s). Am I right in assuming the estimates here are in log(RT)? If so, this makes sense for the estimate of the intercept (exp(3.38 = 29.37), but I’m confused about rest of the estimates. When I run a mixed model in lme4::lmer, the estimates are about a 3 and 9 second difference for the different parameters. I know this a different kettle of fish completely given the fact it’s a different model and approach, but these tiny effects are baffling me a bit.

I fully expect that it is to do with a complete non-understanding of shifted lognormal parameters or how to define the model/priors correctly, so would really appreciate a hand in understanding what’s going on.

Thanks in advance,
Anthony

1 Like

Hey Anthony! Welcome to the Stan forum! Sorry it took us so long to respond.

Disclaimer: I rarely work with brms, and never used the shifted log-normal (is ndt the “shift-model”?). Generally, the estimates of the Log-Normal are on the log-scale.

I think this means that you should treat the priors as if they’re for log-scale coefficients. You kind of already do this (allowing for huge negative intercepts for example), but the scales are pretty off. For example, your prior in the intercept (let’s ignore everything else) would imply a prior predictive distribution that looks like hist(exp(rnorm(1e5, 0, 20))).

Yes.

My advice would be to run posterior predictive checks to see where the predictions actually fall. A Log-Normal model can be hard to interpret from it’s coefficients alone and this is especially true for the “interaction” with the shift.

Hope this helps…
Cheers! :)
Max

Hi Max,

Thanks for your reply! I hope you don’t mind if I reply with some follow up questions/comments.

Yes you are right, ndt is the shift parameter. Your message prompted me to look deeper into the brms families documentation and code, and this is what I found:

shifted_lognormal <- function(link = "identity", link_sigma = "log",
                              link_ndt = "log") {
  slink <- substitute(link)
  .brmsfamily("shifted_lognormal", link = link, slink = slink,
              link_sigma = link_sigma, link_ndt = link_ndt)
}

So my understanding is that the location prior should be defined on the RT scale, and sigma and ndt priors should be defined on the log(RT) scale. Would you agree with this assessment? If that is the case, then should my prior on the ndt intercept actually be "normal(log(50), log(20))"?

If so, does that mean that actually the estimates would be returned on the same scales as the priors? If so, then I am really confused by the returned parameter estimates, because now the intercept would actually be 3.38 seconds and the ndt_intercept is exp(0.13). These added together equal about 4.5 seconds for the mean RT, which is far from the actual mean of the data.

When I run a posterior predictive check with pp_check() on the model, I get the output below. This seems very reasonable to me (ignore the spike at the end, there is a max possible time of 105 seconds), but doesn’t seem to match up with the parameter estimates, unless I’m totally missing something.

I had the thought of defining the ndt link as “identity” in the call:

brm(formula, family = shifted_lognormal(link_ndt = "identity")...)

My thinking was that then the priors for both the location and shift parameters would be on the right scale and the estimates would be returned on the RT scale. The model summary is shown below. Now there is large negative ndt_intercept which baffles me a bit, considering how positive the prior is, and how positive the data is!

Family: shifted_lognormal 
Links: mu = identity; sigma = identity; ndt = identity 
Formula: LT ~ 0 + intercept + group + partner + (1 | p | ID) 
               ndt ~ 0 + intercept + partner + group + (1 | p | ID)
Data: dat (Number of observations: 1296) 
Samples: 3 chains, each with iter = 10000; warmup = 2000; thin = 1;
         total post-warmup samples = 24000

Group-Level Effects: 
~ID (Number of levels: 25) 
                              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)                  0.70      0.10     0.53     0.93 1.00     6592    10589
sd(ndt_Intercept)              27.24     5.55    18.36    39.95 1.00     6843    11459
cor(Intercept,ndt_Intercept)  -0.84      0.07    -0.94    -0.66 1.00     9640    13448

Population-Level Effects: 
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
intercept           3.96      0.15     3.65     4.25 1.00     4774     9136
A2                  0.11      0.03     0.05     0.17 1.00    20313    17169
B2                 -0.15      0.03    -0.22    -0.09 1.00    17921    17427
ndt_intercept      -23.00      6.87   -37.21   -10.10 1.00     6760    11382
ndt_A2              0.10      0.69    -1.26     1.48 1.00    22880    19127
ndt_B2             -1.99      0.69    -3.31    -0.60 1.00    24596    18416

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.27      0.02     0.22     0.31 1.00    14256    15616

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

And yet the posterior predictive check still has RTs in the positive range:

Any further thoughts you (or anyone else) may have on this would be greatly appreciated, or a nudge in the right direction for where I should be looking to better understand these things.

How would you interpret the coefficients then? Would you just look at the HDIs to assess whether there is an effect and then not try to actually interpret the coefficients?

Thanks again,
Anthony

Hey!

Looking at the actual Stan code generated by make_stancode shows that the shifted Log-Normal is parameterized like lognormal_lpdf(Y - ndt | mu, sigma), so, ignoring Jacobians for a moment, this means \log(Y - \text{ndt}) \sim \text{Normal}(\mu, \sigma). With link identity for mu, the coefficients are on the log scale! I know this is a bit confusing – at least for me it was in the beginning. Having a log-link for ndt implies (again, ignoring Jacobian corrections) \log(Y - \exp(\text{ndt})) \sim \text{Normal}(\mu, \sigma), so the ndt coefficients are then also on the log scale as well. You could work out the interpretation for each coefficient, but I’d say the coefficients are not the interesting part of the model – this is especially true for such such non-linear models, or any non-trivial model really. Btw, exp-ing coefficients wont give you the conditional mean of Y, but of its logarithm! Taking the \mu of the above formula, \exp(\mu) is actually an estimate of the (conditional) median of the outcome… and the outcome has ndt in it… well, I’d suggest to look at what group comparisons are meaningful to your analysis an then generate posterior predictions for these outcomes and compare those. This is bit in the spirit of marginal effects (at representative values) which are widely used in frequentist stats as well.

brms is great for that, too! You could try adding truncation to the model… something like trunc(ub = 105), I think. So

bf(RT | trunc(ub = 105) ~ 0 + intercept + A + B + (1 | ID),
   ndt ~  0 + intercept + A + B + (1 | ID))

but I’m not very sure of this one, because I don’t regularly work with brms (although it’s great).

Cheers!
Max

Hi Max,

This makes so much sense. Thanks for the clear explanation, I really appreciate it.

Cheers,
Anthony

1 Like