I am having trouble set up the correct models to compare for this fairly standard problem. I have a 3 way contingency table of
Rating
: integers between 1…5
Locale
: 5 different locales
Side
: base or test
(i have counts for each cell combination and hence the contingency table)
Goal: i want to check if the distribution of counts between Rating X Locale
depends on Side
.
I would image a poisson log linear model could be applied here, using R formula syntax
The model for no dependence on side
M1: n ~ side + locale*rating
and the model for dependence (the saturated model)
M2: n ~ locale*rating*side
And then to infer side is significant or not, compare M1 and M2
When i tried this with the following code
nlbl_model1 <- glm(n ~ sides + locale_id*rating,
family = poisson,
data = ygragg)
nlbl_model2 <- glm(n ~ locale_id*rating*sides,
family = poisson,
data = ygragg)
anova(nlbl_model1,nlbl_model2,test='lrt')
The latter model is considered a better fit i.e. Rating X Locale
distribution is associated with sides
I tried setting this up in brms
nlbl_model1_b = brm( n ~ sides + locale_id*rating ,
family='poisson',
chains = 3, cores = 3,
backend = "cmdstanr", threads = threading(4),data=ygragg
)
nlbl_model2_b = brm( n ~ locale_id*rating*sides,
family='poisson',
chains = 3, cores = 3,
backend = "cmdstanr", threads = threading(4),data=ygragg
)
The coefficient estimates are the same. When i ran loo(nlbl_model1_b,nlbl_model2_b)
Warning messages:
1: Found 36 observations with a pareto_k > 0.7 in model 'nlbl_model1_b'. We recommend to set 'moment_match = TRUE' in order to perform moment matching for problematic observations.
2: Found 55 observations with a pareto_k > 0.7 in model 'nlbl_model2_b'. We recommend to set 'moment_match = TRUE' in order to perform moment matching for problematic observations.
And that
elpd_diff se_diff
nlbl_model1_b 0.0 0.0
nlbl_model2_b -145.2 41.4
Which would imply that the simpler model (sides
does not make a difference) is preferred.
From the data and inspection of the underlying data, sides
ought to make a difference.
What am i doing wrong from the bayesian approach? (or for that matter is M1 and M2 setup correctly for the hypothesis in mind?)
thanks much in advance
Saptarshi
- once I get a handle on this, I would like to make
locale_id
as a random effect (as locales will become several).