I have a dataset where I am looking at features found in different languages, something like:
Language Feature.A Feature.B
language-1 val.a.1 val.b.1
language-2 val.a.2 ...
language-3 ... ...
Additionally, I have family information for each language. However, family information is not well structured, it is more a series of groups of different size. For example, for Spanish I have the information:
[Indo-European, Italic, Romance, Western Romance, Ibero-Romance, West-Iberian, Castilian languages]
And for Guarani I have:
[Tupian, Tupi–Guarani, Guarani (I), Guarani]
Where the leftmost label corresponds to most general family group, and the rightmost label corresponds to the most specific family grouping.
Different languages have different numbers of groupings, and there is no obvious way to determine which subgroups for one family should correspond to which other subgroups for a different family.
The model I have in mind should go something like this:
Feature.A ~ Feature.B + (1|Family)
However, I do not see any reasonable way of including all family information I have. I could of course take the largest and smallest grouping and include those, but this seems like an arbitrary choice which would miss part of the structure of the data.
Is there any way of doing this better? can the group level effect be defined in such a way that it includes all the hierarchical information available?
Thanks!