Dear stan community,
I am trying to fit a multinomial multilevel model with 6 discrete categories and 2 levels of variance using the brms-package with family = categorical( link = “logit”).
First a few words about the data and my general aim:
We conducted an ambulatory assessment study and captured data from 119 participants over the course of 10 days. In each situation each participant could choose from 6 discrete categories (musical styles). In addition several variables measuring the situation (e.g. current mood) were measured. The data is not balanced, meaning that the amount of situations reported per participant varies. We also measured variables characterizing the participants (e.g. personality).
Our goal is to predict the outcome variable (musical styles, 6 categories) and we are interested in associations of variables on different variance levels (2 levels) with the outcome variable.
In detail, we are interested in:
- Associations between variables on level one (within situation) such as current mood and the outcome
- Associations between variables on level two (between subject) such as personality and the outcome
- Spliting the variance (ICC) of the two levels to see which percentage of the variance is attributable to each level.
What I did so far:
I started by fitting an intercept-only model using family = categorical(link = “logit”).
m1 <- brm(Style.Self.Code ~ 1 + (1|PersID), data = DF_brms, family = categorical(link = "logit"))
Brms fits the model to the data without any problems but I have some problems understanding the summary output that I could not find any information in the vignette about:
Family: categorical
Links: muOtherculturesLatin = identity; muPop = identity; muRockMetal = identity; muTechnoEDM = identity; muVolksmusikSchlager = identity
Formula: Style.Self.Code ~ 1 + (1 | PersID)
Data: DF_brms (Number of observations: 1561)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Group-Level Effects:
~PersID (Number of levels: 104)
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(muOtherculturesLatin_Intercept) 1.89 0.31 1.35 2.55 1163 1.01
sd(muPop_Intercept) 1.60 0.24 1.18 2.11 1148 1.00
sd(muRockMetal_Intercept) 1.77 0.28 1.29 2.40 1010 1.00
sd(muTechnoEDM_Intercept) 1.92 0.26 1.46 2.49 1091 1.01
sd(muVolksmusikSchlager_Intercept) 1.43 0.47 0.53 2.42 593 1.01
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
muOtherculturesLatin_Intercept 0.04 0.31 -0.62 0.58 1211 1.00
muPop_Intercept 1.00 0.24 0.52 1.46 1511 1.00
muRockMetal_Intercept 0.84 0.26 0.31 1.33 1113 1.00
muTechnoEDM_Intercept 1.31 0.26 0.78 1.80 1205 1.00
muVolksmusikSchlager_Intercept -1.54 0.44 -2.54 -0.82 936 1.00
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
- Why the identity function is shown in the second row starting with Links? Shouldn’t it be the logit function linked here?
- How to interpret the intercept coefficients? Are these exponentiated odds-ratios?
If I look at the predicted values using predict(m1)
I have the following question:
P(Y = Blues & Jazz) P(Y = Other cultures & Latin) P(Y = Pop) P(Y = Rock & Metal) P(Y = Techno & EDM) P(Y = Volksmusik & Schlager)
[1,] 0.05950 0.01350 0.09875 0.53800 0.27950 0.01075
[2,] 0.05575 0.01325 0.09475 0.54500 0.28175 0.00950
[3,] 0.05150 0.01725 0.09700 0.54075 0.28225 0.01125
[4,] 0.05875 0.01125 0.10550 0.55000 0.26700 0.00750
[5,] 0.05825 0.01525 0.10275 0.53300 0.28000 0.01075
[6,] 0.05475 0.01200 0.10100 0.53400 0.28875 0.00950
[7,] 0.05175 0.01175 0.09800 0.54550 0.28350 0.00950
[8,] 0.05700 0.01425 0.10375 0.52100 0.29350 0.01050
[9,] 0.04750 0.01325 0.10400 0.55450 0.27000 0.01075
[10,] 0.05875 0.01150 0.09950 0.54125 0.27650 0.01250
...
- Why do the predicted probabilities vary between situations (i.e., from line to line)? The first 41 lines belong to one person (level-2 unit). As far as I understand it, there should be one probability for each person and category.
My last question addresses the variance split:
- Is it possible to calculate an intraclass-correlation coefficient for such a model?
I will be very happy about any suggestions regarding one of these four questions.