Categorical family: group-levels not present in each response category - is this ok?

Thanks for the reply @martinmodrak! I am also tagging this other post Multilevel ordinal model - large values in threshold and group level sd estimates? here because these two separate questions stem from the same problem with “complete separation” in the group level predictors.

I’ll explain the specific problem in a bit more detail and ask the community for ways of dealing with it, recognizing that there may not be a good solution.

The df dataset above has X as a continuous predictor, and Y as a categorical response. Then I have group. What group represents in reality are different species. So the real data look like this

   measurement category species
         <dbl> <chr>      <int>
 1     -1.88   a              1
 2     -1.20   a              1
 3      0.608  a              1
 4     -1.21   a              2
 5     -0.0216 a              2
 6      0.824  a              2
 7      1.14   b              3
 8      0.232  b              3
 9      1.30   b              3
10      1.16   b              4

Where each species belongs only in a single response category, but I have multiple measurements per species ; 96 total measurements from 24 species. Unlike this fake data set though I have uneven sampling of each species (mean = 3.5, range = 1-5).

My goal is to use the intraspecific error in the model rather than use the mean of each species as a predictor, but of course each species only belongs to one response category, so I can’t use a regular category ~ measurement + (1|species) structure. Here are a few possible solutions:

  1. Ignore the species grouping variable, and run the model with all data. This of course means that I have uneven sampling of each species, therefore uneven contribution from different species which could bias results.

  2. Estimate a standard error for each species, and use a measurement error model such as category ~ me(measurement, se(measurement)). The problem with this is that I reduce my dataset from n=96 to n=24.

  3. Some form of non-linear model perhaps, where I jointly estimate a distribution of mean values for each species and use these values as predictors to estimate the focal model category ~ measurement. This is similar conceptually to what I mention above with

a different way to remove certain categorical combinations from the model?

but different because there is no direct group effect. However I am not really experienced with the non-linear framework and am not sure if that is possible or advised.

In this post I was discussing the categorical response family, though I believe the same principles apply to any of the ordinal families as well, or the bernoulli family for that matter.

Does anyone have suggestions on a good way to proceed with this data structure. I feel like this can’t be a unique problem. Thank you in advance.

1 Like