Multinomial to Poisson transformation

andre.pfeuffer · September 26, 2018, 5:08am

My question is primarily a math question about the Multinomial to Poisson transformation,
referring to a post about how to use the multinominal distribution in brms.

See: https://github.com/paul-buerkner/brms/issues/338#issuecomment-369760462
The math based on:
https://stats.stackexchange.com/questions/105282/logistic-multinomial-regression-as-two-multiple-poisson-regressions

\log \frac{\mu_{ij}}{\mu_{iJ}} = \alpha_j + X_i \beta_j

is transformed to:

\log\, \mu_{ij} = \eta + \theta_i + \alpha_j^* + X_i\beta_j^*

and the brms fit by Paul is:

fit2 <- brm(y ~ key + key:x + (1 | obs), df_long, family = poisson())

Since \alpha_j = \alpha^*_j − \alpha^*_J
it follows: \alpha_J = 0 (the reference group).

\eta is the intercept in fit2, and (1 | obs) refering to \theta_i.

Is this model identifiable? Why do we need \theta_i?

The brms fit:

 Family: poisson 
  Links: mu = log 
Formula: y ~ key + key:x + (1 | obs) 
   Data: df_long (Number of observations: 300) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; 
         total post-warmup samples = 4000
    ICs: LOO = NA; WAIC = NA; R2 = NA
 
Group-Level Effects: 
~obs (Number of levels: 100) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     0.02      0.02     0.00     0.06       2184 1.00

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept     1.57      0.06     1.44     1.69       2509 1.00
keyy2         1.18      0.07     1.04     1.33       2655 1.00
keyy3         1.66      0.07     1.52     1.79       2653 1.00
keyy1:x       0.77      0.08     0.62     0.93       2886 1.00
keyy2:x      -0.36      0.06    -0.46    -0.25       4000 1.00
keyy3:x      -0.85      0.05    -0.95    -0.75       4000 1.00

Considering glm:
fit3 <- glm(y ~ key + key:x, data=df_long, family = poisson)
summary(fit3)

Call:
glm(formula = y ~ key + key:x, family = poisson, data = df_long)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.11253  -0.65366   0.01812   0.60466   2.06867  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.56862    0.06455  24.302  < 2e-16 ***
keyy2        1.17994    0.07380  15.988  < 2e-16 ***
keyy3        1.65664    0.07044  23.519  < 2e-16 ***
keyy1:x      0.77511    0.07801   9.936  < 2e-16 ***
keyy2:x     -0.35613    0.05575  -6.387 1.69e-10 ***
keyy3:x     -0.85128    0.05155 -16.515  < 2e-16 ***

and the multinom:

multinom(formula = Y ~ x, data = df)

Coefficients:
  (Intercept)         x
2    1.179938 -1.131244
3    1.656643 -1.626395

Std. Errors:
  (Intercept)          x
2  0.07380438 0.09588863
3  0.07043854 0.09350489

These fit matches, the slope math:

> -0.35613 - 0.77511
[1] -1.13124
> -0.85128 - 0.77511
[1] -1.62639

There is no need for \theta_i, so why brms needs the (1 | obs) term in the formula?
This is I don’t understand. Is it a Stan thing?

paul.buerkner · September 26, 2018, 6:25am

As you see above, the model is definitly identifiable. Don’t forget we set a prior on \theta_j. In the above model sd(obs) is super small. What happens if you simulate data so that it is larger?

andre.pfeuffer · September 26, 2018, 6:43am

Thanks Paul, Stan is tolerant in any means.
The idea of \theta_i is fitting the normalizing constant in the category.
And later in the link it’s said is canceled out.

If I fit the model with \theta_i and without and compare the coefficients, the
difference is about 1e-10. Thus do I need \theta_i? If not, I would save a lot
of parameters is Stan.

Added later: It’s pathological with this special case example.
However \theta_i can be N - 1 dimensional, if not have to.

paul.buerkner · September 26, 2018, 9:35am

You mean for your particular example or in general?

For your particular example you may be right that dropping theta is ok,
as we see the results of the two models being very similar. However, in order to know that, we had to fit the model including theta anyway.

In general, I have my doubts that you can always drop that. I haven’t investigated this in detail but I would think that as soon as the number of trials differs across observations, thetas may have a considerably non-zero SD and thus the results may differ. So unless you can show me a thorough mathematical analysis or simulation study, I would be hesitant to conclude theta could just be dropped (in general).

Please also note that equalty of point-estimates (say, the mean) doesn’t imply equality of posterior distributions. For instance, you will also need the SE (and CIs etc.) to be comparable (which is of course not necessarily sufficient to see equality of the whole posterior distribution).

Topic		Replies	Views
Hierarchical Multinomial Model taking days to run brms performance , multinomial-response , hierarchical-model	14	1702	April 9, 2020
Poisson or not? brms	17	3344	January 24, 2019
Bivariate poisson distribution in brms: Trying to replicate football Stan model brms specification , hierarchical-model	4	2007	November 4, 2021
Prior specification for beta distribution in brms brms	20	6055	February 13, 2020
Use categorical /multinomial family with binary-ish data based on separate column in brms brms multinomial-response , brms	8	1694	August 8, 2023

Multinomial to Poisson transformation

Related topics