# Translating cross-classified and multiple membership model estimated in brms into equation

Hi all, I would like to report two models I have estimated in brms and am hoping someone here can help me with the equation. I have given it a go but don’t know anyone I could ask directly, so I am hoping for the help of the community. I appreciate that this is quite long though…

Cross-classified model with students (L1) nested in the cross-classification of teachers (L2) and classrooms (L2)

Brms model is:

Y ~ 0 + Intercept + V1_cgm + V2_cgm + V3_cwc + V4_cwc + V5_cwc + V6_cwc +V7_cwc + V8_cgm + (0 + Intercept + V1_cgm + V2_cgm + V3_cwc + V4_cwc + V5_cwc + V6_cwc + V7_cwc |p| T1_ID) + (0 + Intercept + V1_cgm + V2_cgm + V3_cwc + V4_cwc + V5_cwc + V6_cwc + V7_cwc |q| C1_ID), family = skew_normal())

Where

V1-V7 are student-level variables (L1), modelled as random effects
V8 is a teacher/classroom level variable (L2)
T1_ID is the teacher unit
C1_ID is the classroom unit
cgm = grand-mean centered
cwc = centered within cluster

This is what I have got:

The formula at Level 1 is

and, at Level 2

2) Cross-classified multiple membership model with students (L1) nested in the cross-classification of teachers (L2) and classrooms (L2), whereby students can belong to multiple teachers and classrooms

Y ~ 0 + Intercept + V1_cgm + V2_cgm + V3_cgm + (0 + Intercept + V1_cgm + V2_cgm |p| mm(T1_ID, T2_ID, T3_ID, T4_ID, weights = cbind(W_T1, W_T2, W_T3, W_T4))) + (0 + Intercept + V1_cgm + V2_cgm |q| mm(C1_ID, C2_ID, C3_ID, C4_ID, weights = cbind(W_C1, W_C2, W_C3, W_C4))), family = skew_normal())

Where
V1 and V2 are student-level variables (L1), modelled as random effects
V3 is a teacher/classroom level variable (L2)
T1_ID is the teacher unit
C1_ID is the classroom unit
cgm = grand-mean centered

And the equation I have got so far is

The formula at Level 1 is

and, at Level 2

where,

If someone takes the time to look through these, many thanks in advance!

PS. So sorry for the formatting - I gave my best

1 Like

This looks roughly correct (but I didn’t check all the details), although I personally prefer syntax with “sampling statements” (e.g. as in equtiomatic documentation lme4::lmer Models) to having symbols for residuals, but that’s an aesthetic preference In the formula. I’ll however note that your family is skew_normal, while the formula declares the residual as just normal.

Best of luck with writing!

Thank you very much for having a look at this! And thanks for the link and pointing out the mistake. Is this the correct way of specifying residuals as having a skew-normal distribution?

Thanks

1 Like

I don’t think there is a “correct” way, there are many different conventions and the important part is that people can understand what you did (to that effect, many people would IMHO actually prefer just to see the formula of the model as you used in brms than a math formula). I think skew-normal does not have a well-accepted short notation, so I would rather spell it out as \sim \mathrm{SkewNormal}(0, \omega^2, \alpha). I am however not 100% sure if y \sim \mathrm{SkewNormal}(\mu, \omega^2, \alpha) (which is what brms does is euqivalent to

y = \mu + e \\ e \sim \mathrm{SkewNormal}(0, \omega^2, \alpha)

I just don’t understand skew normal well enough.

In brms, I am using a parameterization suggested by @Stephen_Martin. Perhaps he as some suggestions how to write it down.

2 Likes

Thank you. Really appreciate your help!

If I recall correctly (it’s been a few years) - I recommended a reparameterized skew normal, in which the ‘location’ parameter is the expected value. If that’s the case, then yes - I think one could say e ~ SN(0, w, a), so long as it’s clear that the ‘0’ is the mean, and not the xi parameter.

Edit: I don’t recall if we further reparameterized so that the ‘scale’ parameter is the sqrt(variance), or if we left the scale to be omega.

1 Like

Thanks! I think I remember now that we made the scale parameter the SD as well but kept the skewness parameter of the original parameterization rather than the one typically used for mean-SD parameterization, forming a kind-of hypride between two common parameterizations. I think you had run some experiments with that and found it to be more sampling efficient, but I really don’t know for certain anymore.

1 Like

I think you’re right, yes. Basically, it’s the definition from the Stan manual and wiki page, but reparameterized to be on mu and sd; but the alpha-skewness-parameter (real on -inf to inf) remains the same. The alternative is parameterizing in terms of the skewness coefficient directly, but it had some weird limit of like, -.998 to .998, and was much less efficient than keeping the parameter on -inf to inf.