Question about embedded group level effects

Matias_Guzman_Naranjo · November 3, 2019, 1:01pm

I have a dataset where I am looking at features found in different languages, something like:

Language   Feature.A Feature.B
language-1 val.a.1       val.b.1
language-2 val.a.2       ...
language-3 ...               ...

Additionally, I have family information for each language. However, family information is not well structured, it is more a series of groups of different size. For example, for Spanish I have the information:

[Indo-European, Italic,  Romance, Western Romance, Ibero-Romance, West-Iberian, Castilian languages]

And for Guarani I have:

[Tupian, Tupi–Guarani,  Guarani (I),  Guarani]

Where the leftmost label corresponds to most general family group, and the rightmost label corresponds to the most specific family grouping.
Different languages have different numbers of groupings, and there is no obvious way to determine which subgroups for one family should correspond to which other subgroups for a different family.

The model I have in mind should go something like this:

 Feature.A ~ Feature.B + (1|Family)

However, I do not see any reasonable way of including all family information I have. I could of course take the largest and smallest grouping and include those, but this seems like an arbitrary choice which would miss part of the structure of the data.

Is there any way of doing this better? can the group level effect be defined in such a way that it includes all the hierarchical information available?

Thanks!

martinmodrak · November 5, 2019, 11:59am

Sorry, can’t respond now, but maybe @Max_Mantei is not busy and can answer?

Matias_Guzman_Naranjo · November 5, 2019, 12:01pm

Looking a bit more into it it seems like I could use something like a phylogenetic model with brms like these: https://cran.r-project.org/web/packages/brms/vignettes/brms_phylogenetics.html , provided that I induce a phylogenetic tree from the family information.

(If I may tag you) @paul.buerkner , would this do what I want?

paul.buerkner · November 5, 2019, 12:07pm

If you can construct a phylogenetic tree then extract the induced correlation matrix, this could be what you want, yes.

Matias_Guzman_Naranjo · November 5, 2019, 12:14pm

Thanks! Follow up question, in my data each observation is a language, and each language belongs to a micro-family. I can build the phylogenetic tree all the way to each language and then fit:

Feature.A ~ Feature.B + (1|language), cov_ranef = list(language = A)

As in your first example. Or I could build the phylogenetic tree just up to the smallest micro-families and then fit:

Feature.A ~ Feature.B + (1|family.2) + (1|family), cov_ranef = list(family = A)

As in your second example.

Is there any reason to prefer one over the other?

paul.buerkner · November 5, 2019, 12:16pm

Depends on whether you are interested in the second level I would say.

Max_Mantei · November 5, 2019, 1:17pm

Nothing much to add here. I would have suggested something like Feature.A ~ Feature.B + (1|family.2) + (1|family) without the phylogenetic tree (which I know nothing about). My hunch would be that it doesn’t make much of a difference if there’s no variation in Feature.A and Feature.B across “sub”-families. The phylogenetic tree is most likely the better way to extract the structure of the data and incorporate it in the model.

Topic		Replies	Views
Categorical family: group-levels not present in each response category - is this ok? brms	5	540	May 3, 2021
Group-level effects for phylogenetic mixed model in brms - groups having different number of observations brms brms	8	164	April 19, 2024
Phylogenetic models with multiple group-level effects brms	6	696	July 15, 2021
Help in designing the best model for my data Modeling specification , hierarchical-model	2	329	September 15, 2023
Issue with group-level effects using 0-1-inflated beta family brms	3	1067	July 20, 2018

Question about embedded group level effects

Related Topics