Log Linear Models for testing the distribution of A, B depends on C (or not)

I am having trouble set up the correct models to compare for this fairly standard problem. I have a 3 way contingency table of

Rating: integers between 1…5
Locale: 5 different locales
Side: base or test

(i have counts for each cell combination and hence the contingency table)

Goal: i want to check if the distribution of counts between Rating X Locale depends on Side.

I would image a poisson log linear model could be applied here, using R formula syntax

The model for no dependence on side

M1: n ~ side + locale*rating

and the model for dependence (the saturated model)

M2: n ~ locale*rating*side

And then to infer side is significant or not, compare M1 and M2

When i tried this with the following code


nlbl_model1 <- glm(n ~ sides + locale_id*rating,
             	family = poisson,
             	data = ygragg)

nlbl_model2 <- glm(n ~ locale_id*rating*sides,
             	family = poisson,
             	data = ygragg)
anova(nlbl_model1,nlbl_model2,test='lrt')

The latter model is considered a better fit i.e. Rating X Locale distribution is associated with sides

I tried setting this up in brms

nlbl_model1_b = brm( n ~  sides + locale_id*rating ,
   family='poisson',
  chains = 3, cores = 3,
  backend = "cmdstanr", threads = threading(4),data=ygragg
)

nlbl_model2_b = brm( n ~  locale_id*rating*sides,
   family='poisson',
  chains = 3, cores = 3,
  backend = "cmdstanr", threads = threading(4),data=ygragg
)

The coefficient estimates are the same. When i ran loo(nlbl_model1_b,nlbl_model2_b)

Warning messages:
1: Found 36 observations with a pareto_k > 0.7 in model 'nlbl_model1_b'. We recommend to set 'moment_match = TRUE' in order to perform moment matching for problematic observations.  
2: Found 55 observations with a pareto_k > 0.7 in model 'nlbl_model2_b'. We recommend to set 'moment_match = TRUE' in order to perform moment matching for problematic observations.

And that

         	elpd_diff se_diff
nlbl_model1_b	0.0   	0.0
nlbl_model2_b -145.2  	41.4

Which would imply that the simpler model (sides does not make a difference) is preferred.

From the data and inspection of the underlying data, sides ought to make a difference.

What am i doing wrong from the bayesian approach? (or for that matter is M1 and M2 setup correctly for the hypothesis in mind?)

thanks much in advance
Saptarshi

  1. once I get a handle on this, I would like to make locale_id as a random effect (as locales will become several).

What happens if you follow the advice from loo() and rerun it with moment_match = TRUE? Did you get any convergence warnings when running either model?

I did and found warnings such as

Warning messages:
1: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
 
2: Found 8 observations with a pareto_k > 0.7 in model 'nlbl_model2_b'. We recommend to set 'reloo = TRUE' in order to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 8 times to compute the ELPDs for the problematic observations directly. 

The full output from loo is

Output of model 'nlbl_model1_b':

Computed from 3000 by 70 log-likelihood matrix.

         Estimate    SE
elpd_loo  -1252.3 133.1
p_loo       328.8  43.1
looic      2504.6 266.3
------
MCSE of elpd_loo is 0.5.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.2]).

All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.

Output of model 'nlbl_model2_b':

Computed from 3000 by 70 log-likelihood matrix.

         Estimate    SE
elpd_loo  -1516.4 152.9
p_loo       680.7  84.2
looic      3032.9 305.8
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.3, 1.2]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     62    88.6%   26      
   (0.7, 1]   (bad)       1     1.4%   <NA>    
   (1, Inf)   (very bad)  7    10.0%   <NA>    
See help('pareto-k-diagnostic') for details.

Model comparisons:
              elpd_diff se_diff
nlbl_model1_b    0.0       0.0 
nlbl_model2_b -264.1      66.1 

As for running the model, no issues at all

 Family: poisson 
  Links: mu = log 
Formula: n ~ sides + locale_id * rating 
   Data: ygragg (Number of observations: 70) 
  Draws: 3 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 3000

Regression Coefficients:
                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                 3.10      0.07     2.95     3.24 1.00      912     1285
sidestest                -0.01      0.01    -0.03     0.02 1.00     4391     2407
locale_idde_DE           -1.51      0.13    -1.76    -1.25 1.00     1463     1798
...
locale_idja_JP:rating     0.14      0.03     0.09     0.19 1.00     1318     1696

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

and

> nlbl_model2_b
 Family: poisson 
  Links: mu = log 
Formula: n ~ locale_id * rating * sides 
   Data: ygragg (Number of observations: 70) 
  Draws: 3 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 3000

Regression Coefficients:
                                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                           3.22      0.10     3.02     3.41 1.01      736     1392
locale_idde_DE                     -1.31      0.17    -1.63    -0.96 1.00     1299     1596
locale_ides_ES                     -1.10      0.16    -1.41    -0.79 1.00     1058     1810
...
locale_idja_JP                     -0.69      0.16    -1.00    -0.39 1.00     1240     1716
rating                              0.68      0.02     0.64     0.73 1.01      718     1274
sidestest                          -0.26      0.15    -0.55     0.03 1.01      709      877
locale_idde_DE:rating               0.28      0.04     0.20     0.36 1.00     1205     1528
locale_ides_ES:rating               0.26      0.04     0.18     0.33 1.00     1007     1846
...
locale_idja_JP:sidestest           -0.06      0.23    -0.52     0.38 1.01     1118     1416
rating:sidestest                    0.06      0.03    -0.01     0.13 1.01      692     1121
locale_idde_DE:rating:sidestest     0.10      0.06    -0.01     0.21 1.01     1095     1836
locale_ides_ES:rating:sidestest     0.05      0.05    -0.05     0.16 1.00      974     1221
...
locale_idja_JP:rating:sidestest     0.00      0.05    -0.10     0.11 1.01     1061     1428

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
> 

FWIW, i did run with chains = 4, cores = 4,control=list(adapt_delta=.99,max_treedepth=12), and the rhats were resoundingly a 1.000 and when i ran loo i still got the same warning (regarding moment matches) and ran again with moment_match set to TRUE and still got the same pareto k diagnostic warnings.

I think that loo() sometimes complains if the model is a really bad fit for the data. Could you post the output of calling pp_check() on both of the fit models? You may need a gamma-poisson mixture (negative binomial) to model your count data if the dispersion around the means doesn’t match up with the Poisson distribution.

Also, if Rating is ordinal data, it might be better to include it as a predictor as a monotonic effect in brms by wrapping it in mo(). Here is the vignette on using them.