Indexing approach to predictors

Newbie to brms here. I am trying to implement an model where the intercepts of the binomial regression depends on the value of corresponding index value of the same observation. It looks like this in rethinking package:

m11.4 <- ulam(
alist(
    pulled_left~dbinom(1, p),
    logit(p) <- a[actor] + b[treatment],
    a[actor] ~ dnorm(0, 1.5),
    b[treatment] ~ dnorm(0, 0.5)
),
data = dat_list, chains = 4,log_lik = T
)

this could be trans-compiled into stan code as below:

data{
    int pulled_left[504];
    int treatment[504];
    int actor[504];
}
parameters{
    vector[7] a;
    vector[4] b;
}
model{
    vector[504] p;
    b ~ normal( 0 , 0.5 );
    a ~ normal( 0 , 1.5 );
    for ( i in 1:504 ) {
        p[i] = a[actor[i]] + b[treatment[i]];
        p[i] = inv_logit(p[i]);
    }
    pulled_left ~ binomial( 1 , p );
}
generated quantities{
    vector[504] log_lik;
    vector[504] p;
    for ( i in 1:504 ) {
        p[i] = a[actor[i]] + b[treatment[i]];
        p[i] = inv_logit(p[i]);
    }
    for ( i in 1:504 ) log_lik[i] = binomial_lpmf( pulled_left[i] | 1 , p[i] );
}

Just wondering how should I do the same in brms? which of the below code would work?

option 1:

m11.4_brms <- brm(data = dat_list, family = binomial,
            pulled_left | trials(1) ~ 0 + (1|treatment) + (1|actor),
            prior = c(
                prior(normal(0, 1.5), class = sd, coef = "Intercept", group = "actor"),
                prior(normal(0, 0.5), class = sd, coef = "Intercept", group = "treatment")
            )
            )

Option 2:

m11.4_brms2 <- brm(data = dat_list, family = binomial,
            pulled_left | trials(1) ~ 0 + factor(treatment) + factor(actor),
            prior = prior(normal(0, 1.5), class = b)
            )

Option 1 yields similar results to the rethinking model, but is it a multi-level model?

Option 2’s results are different and the intercept is hard to interpret, however it feels spiritually closer to the original?

Could someone help illustrate the syntax for indexing variable in this context?

Thanks!

2 Likes

Option 1 seems correct to me (and is also a multilevel model).

Many thanks for developing this excellent package! we owe you a great deal for the convenience and elegance it provided! Special thanks for taking the time to answer questions from a newbie.

When I used get_prior on the model option 1,

get_prior( data = dat_list, family = binomial,
          pulled_left | trials(1) ~ 0 + (1|treatment) + (1|actor))

I got 5 priors I could specify. except the 2 I already specified, there were 2 more from each of the treatment and actor group, their coef field is empty. what would they mean?

there was also another prior that I could specify that is also of sd class, it didn’t belong to any group, with a default prior of student_t(3, 0, 10). What would this prior mean?

?set_prior provides all the details on how brms uses the information in brmsprior objects.

@Paul_Dong, have you checked out the translations of Rethinking to brms?

@torkar Indeed, that is where I came from. the factor notion actually came from its 12.1 Multilevel tadpoles. Mr. Solomon Kurz was illustrating the difference between a multilevel model and that of a simpler model.

After studying the stan code m11.4_brms above generated, I still couldn’t quite grasp the additional parameter, they seemed to be some sort of scaling parameter, which didn’t show up if I ran ranef on the fitted model.

I re-post my code here:

library(magrittr)
library(rethinking)
data(chimpanzees)
d <- chimpanzees
d$treatment <- 1 + d$prosoc_left + 2 * d$condition

dat_list <- list(
pulled_left = d$pulled_left,
# prosoc_left = d$prosoc_left, # see if this is tolerable
treatment = as.integer(d$treatment),
actor = d$actor
)

library(brms)

get_prior( data = dat_list, family = binomial,
          pulled_left | trials(1) ~ 0 + (1|treatment) + (1|actor))

the get_prior returned 5 priors I can specify, but I don’t know what the priors indicated by row #1, #2 and #4 represent

I don’t think they showed up in the ranef output, but posterior_samples returned sd_actor__Intercept and sd_treatment__Intercept, not sure I understand their presence either.

is there anywhere I can read about them with examples and equations showing where they went into the model?

thanks!

Hi @Paul_Dong,

Doesn’t the help page @paul.buerkner is referring to clarify this?

Each group-level effect of each grouping factor has a standard deviation named sd_<group>_<coef>. Consider, for instance, the formula y ~ x1 + x2 + (1 + x1 | g). We see that the intercept, as well as x1, are group-level effects nested in the grouping factor g. The corresponding standard deviation parameters are named as sd_g_Intercept and sd_g_x1 respectively. These parameters are restricted to be non-negative and, by default, have a half student-t prior with 3 degrees of freedom and a scale parameter that depends on the standard deviation of the response after applying the link function

thank you @paul.buerkner and @torkar, after learning some more on the course. I want to further explain what I was trying to achieve and I think I have the statistic parts of the model covered, just the modeling syntax that is difficult to me now.

First of all, I want to build a single level model (as oppose to multilevel ones), the model specification in math would look like this:

\begin{eqnarray} \text{pulled_left}_i & \sim & \text{Binomial} (1, p_i) \\ \text{logit} (p_i) & = & a_{actor[i]} + b_{treatment[i]}\\ a_{actor[i]} & \sim & \text{Normal} (0, 1.5)\\ b_{treatment[i]} & \sim & \text{Normal} (0, 0.5) \end{eqnarray}

I would however, categorize this as a simple level model, and I want to observe first hand the effect of pooling and shrinkage once switch to multi-level model.

m11.4_brms2 <- brm(data = dat_list, family = binomial,
            pulled_left | trials(1) ~ 0 + factor(treatment) + factor(actor),
            prior = c(
            prior(normal(0, 1.5), class = b, coef = "factoractor2"),
            prior(normal(0, 1.5), class = b, coef = "factoractor3"),
            prior(normal(0, 1.5), class = b, coef = "factoractor4"),
            prior(normal(0, 1.5), class = b, coef = "factoractor5"),
            prior(normal(0, 1.5), class = b, coef = "factoractor6"),
            prior(normal(0, 1.5), class = b, coef = "factoractor7"),
            prior(normal(0, 0.5), class = b, coef = "factortreatment1"),
            prior(normal(0, 0.5), class = b, coef = "factortreatment2"),
            prior(normal(0, 0.5), class = b, coef = "factortreatment3"),
            prior(normal(0, 0.5), class = b, coef = "factortreatment4")
            ))

First allow me to apologize for the messy code, as I do not know how to specify priors for each level any other way. I tried using glob like coef = "factortreatment*" or regex like coef="^factortreatment", both didn’t work. As the a and b in the model have different priors, to use simply class=b cannot distinguish the two. If there’s a better way, please teach me.

Secondly, I don’t know how to specify a prior for coef="factoractor1", by putting a 0+ in the formula, I think I’ve told the model to construct dummy variables for all levels of actor, yet if I try, it raises an error. Is there a way for me to get to factoractor1's prior?

I think the easiest solution would be to use

formula = bf(
  pulled_left | trials(1) ~ a + b, 
  a ~ 0 + actor, 
  b ~ 0 + treatment,
  nl = TRUE
)
prior = prior(normal(0, 1.5), nlpar = "a") + 
  prior(normal(0, 0.5), nlpar = "b")
7 Likes

@paul.buerkner Many thanks! works like a charm, I am learning more from your excellent vignette on non-linear models.

I like brms for its succinct formula notation, but for this case a more explicit specification of parameter actually helped a lot.

1 Like

This is excellent. After watching McElreath’s new lectures and looking through the draft of his 2nd edition, I was stumped on @Paul_Dong’s question, too. Non-linear syntax it is!

This is great, thank you! I’ve been using this approach for a bit, and while it does give the desired index approach for categorical predictors, it seems to substantially increase the sampling time. Any thoughts on ways to increase the efficiency with this framework?

1 Like

brms internally centeres the predictors (around 0) and adjusts the intercept accordingly (you don’t see this in the results as it backtransforms the intercept afterwards). If you center you predictors yourself you may be able to increase sampling efficiency as well.

Oh, I’ve wanted to ask you this @paul.buerkner: Since brms centers the predictors, why do you recommend that one centers the predictors oneself, also?

brms only centres predictors in models with an intercept and not in models using the non-linear syntax as in the example above.

3 Likes

Thanks a lot! Chrystal clear

This post has been so helpful for me. I have a follow up question, regarding the use of the non linear model syntax (for me it is a nice half way between brms and the mathsy formula in the rethinking package). I am trying to specify the model weight = a_{gender} + b_{gender} * height, so the intercept and coefficient are indexed by gender. I have the following code:

bf(
  weight ~ a + b * height,
  a ~ 0 + (1 | gender),
  b ~ 0 + (1 | gender),
  nl = TRUE
)

…but I am not sure, and very unsure of the difference between 0 + (1|gender) or 1 + (1|gender) in this context. Am I on the right track?

As a side note, is it strange to want to use this non-linear syntax everywhere?

Excluding an intercept (i.e. 0 + ...) in a formula where you have a “random coefficient” implies that the random coefficient have a mean of zero. In your case it means that b \sim Normal(0,\sigma^2) instead of b \sim Normal(\beta, \sigma^2) (equivalent to 1 + (1|gender)).

Another way to specify your model with brms syntax is weight ~ weight + (1+weight||gender). The double vertical bars implies a prior of uncorrelated effects.

Thanks Staffan, that is clearer. After playing around a bit I think I actually needed to go to 0 + gender for both instead - I’m guessing the nesting of the model effectively groups the parameter a by gender that way and the 1|gender notation is not needed then.

Is it recommended to center the response variables as well? or just the predictors? How would one do this for a non-gaussian response variable (e.g. counts)?