Logistic brms model: custom prior and interpreting results

I am trying out brms for the first time, coming from an lme4 background in R. I have two questions about my model, which will be indicated in brackets []. The data are from a free recall task, where participants listen to lists of 10 words and then have to recall as many as possible. The response variable is whether or not the participant recalled the word (1 or 0).

For this model, I am predicting the recall by the word’s position in the list that was presented: people should remember the early and late words better than the middle words, so I model both position and position squared. So-called “random” effects include the word’s frequency (how common it is), its phonological density (how many other words there are that sound like it), and the participant’s ID.

From a previous experiment using the same word set, I know that people can recall around 57% of the words presented to them, with a standard deviation of 19%. I’d like to include this as a prior for the intercept, but since it is a logistic model, I am not sure whether I should take the logit of the prior when I define it, or whether brms is doing that in the background [QUESTION 1]. Currently, I have this model:

mod.logitBayesRecall0 <- brm(formula = recalled ~ 1 +
                                (1 | logFreq) + (1 | propDensity) + 
                                (1 | ID) + poly(position, 2),  
                            family = bernoulli(link = "logit"),
                            prior = c(set_prior("normal(0.57, 0.19)", class = "Intercept")),
                            warmup = 500, 
                            iter = 2000, 
                            chains = 2, 
                            inits= "0",
                            seed = 123,
                            file = "./modelcache/mod.logitBayesRecall0")

The model runs and seems to converge well:

Family: bernoulli 
  Links: mu = logit 
Formula: recalled ~ 1 + (1 | logFreq) + (1 | propDensity) + (1 | ID) + poly(position, 2) 
   Data: wordData (Number of observations: 18000) 
Samples: 2 chains, each with iter = 2000; warmup = 500; thin = 1;
         total post-warmup samples = 3000

Group-Level Effects: 
~logFreq (Number of levels: 223) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.61      0.04     0.54     0.69 1.00      866     1449

~ID (Number of levels: 75) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.73      0.06     0.62     0.87 1.00      479      859

~propDensity (Number of levels: 50) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.43      0.07     0.30     0.58 1.00      820     1394

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          0.52      0.10     0.32     0.71 1.00      484     1212
polyposition21    43.72      5.75    31.99    54.84 1.00     1218     1692
polyposition22    54.58      5.13    44.26    64.58 1.00     1635     2152

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

However, I am not sure about how to interpret the estimates, especially for position [QUESTION 2]. I guess I would need to apply an anti-logit/expit function, i.e.
antilogit <- function(x) {1 / (1 + exp(-x))}

… but when I do this for polyposition12 and polyposition23 I get a value that rounds to 1, and I am not sure how to plug in the numbers and get reasonable predictions analytically (e.g. for predicting recall at position 5).

Other checks: the conditional_effects() function shows a reasonable plot reflecting the pattern I expect:

pp_check() gives this plot, which I am less sure about:

Is this model specified correctly, especially with the custom prior? How do I interpret the coefficients? Thanks in advance.

  • Operating System: MacOS Catalina
  • brms Version:
1 Like

Edit: I have looked into the poly() function and I realize now that it is orthogonalizing the polynomial term, which is why the coefficients aren’t so easily interpreted. Still, my question about the interpretation of coefficients remains; must we use anti-logit on the estimate, or is it more complicated than that?

1 Like

The prior should be on the logit scale. One way to investigate what actually brms does is to check the generated Stan code via make_stancode.

Antilog is not really enough - while the predictors are indepedent on the logit scale, predictions on the response scale depend on the combination of all predictors. What conditional_effects is doing is that it takes all other predictors at their mean (continous) or reference (factors) values and plugs that into posterior_predict.

I.e. you can look at the raw coefficients and interpret them as additive changes in log-odds or you can exponentiate them and interpret as multiplicative changes in odds, but you can judge their effect on probability only after plugging in some values for all the other predictors.

You may also be interested in the s terms which might be a better choice than just a quadratic polynomial (but depends obviously on the underlying theoretical understanding of what is happening).

That’s probably because you are using a density check for a binary variable - I would explore type = "bars" and type = "bars_grouped" (with suitable groupings).

Best of luck with your model!

1 Like

Thank you! This is very helpful. Converting the mean seems easy with the logit function, but the sd seems a bit more tricky-- I guess it will depend on the mean, and it will not be symmetric once converted to logit?

1 Like

Re: the suggestion to use s(): In the next level of complexity, I will need to add in the effect of position by ID, since different individuals have their own primacy/recency curves. Based on playing around with brms, it sems that splines cannot be used in a grouping term (e.g. s(position) | ID). However, it did get me wondering if I should model primacy and recency as separate effects, given that they stem from separate cognitive processes… I’m just not sure how to do that yet.

1 Like

Here one needs to be precise: sd of what? In almost all cases, what you want to do is compute posterior samples of a quantity and then summarise as a last step. So e.g. if you want to look at the uncertainty in linear predictor after transforming to response scale, but prior to drawing from the Bernoulli distribution (i.e. the expected average proportion of positive responses if you fixed all covariates and repeated the experiment infinite times), you take the samples, compute linear predictor and transform the linear predictor with inverse logit. Now you have couple thousand samples that you can summarise in any way you like: mean, sd, median, 83.5% credible interval, …

Mike Betancourt has much more than you wanted to know on this and makes this intuition precise at Rumble in the Ensemble

That is correct - I discussed this at Does brms have issues with (perfect) linear dependencies between (smooth) covariates? - #2 by martinmodrak (feel free to ask for any clarifications here, if it is hard to understand)

I would guess that you would need a richer theory to be able to distinguish the two - unless you have that, it seems (to me and I am just guessing and don’t understand your field) unlikely that you’ll be able to do better than have a flexible spline (or other curve) to model this. If all your assumpions would be something like “monotic decrease from both sides”, it seems likely you won’t be able to distinguish details of the individual curves. If on the other hand you have richer mathematical theory, it might be easier to use pure Stan to express it as it gets cumbersome in brms.

Best of luck!

1 Like