Hi folks,
I am seeking your advice on this question. I have a dataset of scientific publications and their citation counts, for which I know which working group published them and to which university the group belongs. I want to take this structure into account in a regression model. With the model I hope to be able to estimate to which the degrees the variance in the citation counts can be accounted for by the different levels (authors, working groups, universities). Because of that I want to estimate random effects for all units of these levels although I am not primarily interested in these unit scores as such, only in their variability within levels.
The difficulty in this is that besides this institutional structure there is a hierarchical relationship between authors and papers and I was unable to figure out how to correctly specify both hierachies simultaneously in a brms model. Typically authors participate in several papers and papers are written by several authors. The actual dataset looks something like this fictitious simplified example:
PAPER AUTHOR_CNT AUTHOR WORKING_GROUP UNIVERSITY CITATIONS
P_A 3 AU_1 U_1_WG_1 U_1 20
P_A 3 AU_2 U_1_WG_1 U_1 20
P_A 3 AU_3 U_2_WG_1 U_2 20
P_B 5 AU_1 U_1_WG_1 U_1 44
P_B 5 AU_4 U_1_WG_2 U_1 44
P_B 5 AU_5 U_2_WG_1 U_2 44
P_B 5 AU_3 U_2_WG_1 U_2 44
P_B 5 AU_6 U_1_WG_1 U_1 44
P_C 6 AU_7 U_2_WG_1 U_2 5
P_C 6 AU_8 U_2_WG_2 U_2 5
P_C 6 AU_9 U_2_WG_2 U_2 5
P_C 6 AU_2 U_1_WG_1 U_1 5
P_C 6 AU_3 U_2_WG_1 U_2 5
P_C 6 AU_1 U_1_WG_1 U_1 5
P_D 7 AU_2 U_1_WG_1 U_1 11
P_D 7 AU_4 U_1_WG_2 U_1 11
P_D 7 AU_5 U_2_WG_1 U_2 11
P_D 7 AU_3 U_2_WG_1 U_2 11
P_D 7 AU_7 U_2_WG_1 U_2 11
P_D 7 AU_8 U_2_WG_2 U_2 11
P_D 7 AU_9 U_2_WG_2 U_2 11
Each row is for one author in a paper, so papers have as many rows as they have authors.
The institutional hierachy for this example is like this:
UNIVERSITY WORKING_GROUP AUTHOR
U_1 U_1_WG_1 AU_1
U_1 U_1_WG_1 AU_2
U_1 U_1_WG_1 AU_6
U_1 U_1_WG_2 AU_4
U_2 U_2_WG_1 AU_3
U_2 U_2_WG_1 AU_5
U_2 U_2_WG_1 AU_7
U_2 U_2_WG_2 AU_8
U_2 U_2_WG_2 AU_9
That is, there are several universities, each with several working groups, each of which again with several authors. They freely collaborate to write papers.
My first thought was simply to include PAPER as another level below AUTHOR:
CITATIONS | weights(1/AUTHOR_CNT) ~ 1 + (1 | UNIVERSITY/WORKING_GROUP/AUTHOR/PAPER)
But it occurred to me that the information on the dependency of observations due to PAPERs would be lost.
I then came up with this crossed structure:
CITATIONS | weights(1/AUTHOR_CNT) ~ 1 + (1 | UNIVERSITY/WORKING_GROUP/AUTHOR) + (1 | PAPER/AUTHOR)
But I believe this to be incorrect because part of the nested structure is lost, but I’m not sure about this.
I considered using a multiple group membership structure but from what I understand, this would mean to create a group variable for each PAPER and assign membership weights for each author to each of those variables, which seems excessive and difficult to arrange with the curretn structure fo the dataset.
I would very grateful for any advice or pointers to similar examples.
Best,
Paul.
- Operating System: Windows 10
- brms Version: 2.8.0