Bernouilli/categorical model where responses don't vary within levels of the grouping variable

Hi all,

I am new to brms and bayes and am just wondering if I could get any insights in how to specify this model (and whether a bayesian approach would help).

I am testing if members of female_group (“A”, “N”, “L”) are more likely to reproduce with members of male_group (“A”, “N”, “L”), and I want to take into account variation in preferences of each individual female and male : (1| female_id) + (1|male_id). Each female_id reproduced multiple times, sometimes with different male_id.

To simplify, I collapsed the response from 3 possible categories (male group) into 2 (conspecific vs. heterospecific) to run as a bernouilli binomial glmm. i.e.

male_group ~ female_species + male_species + (1|female_id) + (1|male_id).

However, when I do this, I get unusual results, including very large variance and standard deviation for the female_id random effect and the full model has an AIC of ~ 60 units worse than the reduced model. After searching around I believe I have the same issue as a user here:

  • that is within the female_id grouping variable, responses are usually either all 0 or all 1, so extend to infinity for the response. Ben Bolker then suggests to collapse the random effects and to run as a glm, but if I do this I lose the male_id random effect.

Is there anyway I can fit a model that still includes these factors or is the issue inherent in the data/modelling structure?

I was attracted to brms as it is possible to generate a model with a categorical response rather than collapse to a binomial as I have done so far. But I assume the problem in the binomial glmm would still persist here even given appropriate priors etc.?

Can you post some example data? I’m having trouble following from your description.

Hi, attached is the data, it is quite sparse in terms of levels within grouping variables, which is probably part of the problem:

brms_dat.csv (6.8 KB)

I tried running a categorical model with brms like so:

fit2 <- brm(male_group ~ female_group + (1|female_id)  +
             (1|male_id_1), 
           family = categorical(link = "logit"), chains = 4, cores = 4,
           data = brms_sym)

I received quite a few warnings:

Warning messages:
1: There were 688 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them. 
2: Examine the pairs() plot to diagnose sampling problems
 
3: The largest R-hat is 1.06, indicating chains have not mixed.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#r-hat 
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess 
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess 

This was the the summary:

> summary(fit2)
 Family: categorical 
  Links: muL = logit; muN = logit 
Formula: male_group ~ female_group + (1 | female_id) + (1 | male_id_1) 
   Data: brms_sym (Number of observations: 191) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Group-Level Effects: 
~female_id (Number of levels: 27) 
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(muL_Intercept)     2.06      2.01     0.07     7.03 1.01      388      260
sd(muN_Intercept)     2.41      2.94     0.05     9.11 1.00     1099     1047

~male_id_1 (Number of levels: 17) 
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(muL_Intercept)    85.28    195.70     8.98   285.34 1.03      119      124
sd(muN_Intercept)    47.49     57.11     5.52   234.48 1.04      115      104

Population-Level Effects: 
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
muL_Intercept       -23.54     34.23  -130.04    34.23 1.06       62       25
muN_Intercept       -19.75     23.84   -81.05    10.16 1.01      425      397
muL_female_groupL    74.76     83.42   -25.17   332.09 1.06       58       29
muL_female_groupN    -0.53     30.23   -69.05    66.41 1.04       77       35
muN_female_groupL    41.91     53.86   -32.03   175.52 1.04      117      248
muN_female_groupN    17.88     31.44   -42.92   111.18 1.03      122       58

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

I am just reading through the documentation for brms, so I’m not really sure how to interpret these warnings and output.

Thanks,
Mike