# Can a multinomial regression be condensed?

I have a toy model here from McElreath’s book; here it is:

``````data{
int N;// number of individuals
int K;// number of possible careers
int career[N];// outcome
vector[K] career_income;
}
parameters{
vector[K-1] a;// intercepts
real<lower=0> b;// association of income with choice
}
model{
vector[K] p;
vector[K] s;
a ~ normal( 0, 1);
b ~ normal( 0, 0.5);
s[1] = a[1] + b*career_income[1];
s[2] = a[2] + b*career_income[2];
s[3] = 0;// pivot
p = softmax( s);
career ~ categorical( p);
}
``````

I am hoping to build a more complex model, with K=100+. Is there a way to condense the notation of such a model, rather than having to write a linear model for each of the K-1 categories?

I was imagining something like

``````s[index_over_K_outcomes] = a[index_over_K_outcomes] + b*career_income[index_over_K_outcomes];
``````

For that matter, perhaps there is an indexing trick that would allow the same linear model to be ran on multiple outcomes. For instance, regressing each K categories as a Poisson on the same linear model.

Many thanks!

1 Like

This should work:

``````data{
int N;// number of individuals
int K;// number of possible careers
array[N] int career;// outcome
vector[K-1] career_income;
}
parameters{
vector[K-1] a;// intercepts
real<lower=0> b;// association of income with choice
}
model{
vector[K] p;
vector[K] s = rep_vector(0, K);
s[1:(K-1)] = a + b*career_income;
a ~ normal( 0, 1);
b ~ normal( 0, 0.5);
career ~ categorical_logit(s);
}
``````

Just ensure that you standardize your career_income variable and that the last value is 0.

1 Like

As there are no individual-level predictors, the parameter `s` is the same for all individuals. What matters is how many individuals have chosen each career (the observed counts per possible career are a sufficient statistic). So in addition to the improvement suggested by @Ven_Popov, you can factorize the categorical likelihood into K components, one for each career:

``````data {
int N; // number of individuals
int K; // number of possible careers
vector[K-1] career_income; // how come one career has no income?
array[N] int career; // outcome
}
transformed data {
array[K] int counts = rep_array(0, K);
for (i in 1:N) {
counts[career[i]] += 1;
}
}
parameters {
vector[K-1] a; // intercepts
real<lower=0> b; // association of income with choice
}
model {
vector[K] s = rep_vector(0, K);
s[1:(K-1)] = a + b * career_income;
a ~ normal(0, 1);
b ~ normal(0, 0.5);

for (k in 1:K) {
target += counts[k] * categorical_logit_lpmf(k | s);
}
}
``````

I think the sampling would be more efficient if the number of individuals N is large compared to the number of careers K.

You can also use the multinomial likelihood: `counts ~ multinomial(softmax(s))` without any loop. (The multinomial pmf includes the multinomial coefficient which is a constant in the parameter `s`.) There is no `multinomial_logit` equivalent though.

1 Like

I though I could work through this on a more complicated model, but I’m still struggling. Here’s my model as currently is:

``````data{
int n_pos;
int n_halves;
int n_offTeams;
int n_defTeams;
int n_matchups;
int n_offCoaches;
int n_defCoaches;
int n_posTypes;
int<lower=1, upper=n_halves> half[n_pos];
int<lower=1, upper=n_posTypes> posType[n_pos];
int<lower=1, upper=n_offCoaches> offCoach[n_pos];
int<lower=1, upper=n_defCoaches> defCoach[n_pos];
int<lower=1, upper=n_offTeams> offTeam[n_pos];
int<lower=1, upper=n_defTeams> defTeam[n_pos];
int<lower=1, upper=n_matchups> matchup[n_pos];
vector[n_pos] weight;
vector[n_pos] offFouls;
vector[n_pos] defFouls;
vector[n_pos] offRecord;
vector[n_pos] defRecord;
vector[n_pos] offWinStreak;
vector[n_pos] offLoseStreak;
vector[n_pos] defWinStreak;
vector[n_pos] defLoseStreak;
}
parameters{
vector[n_posTypes-1] a;
vector[n_offCoaches] a_offCoach;
vector[n_defCoaches] a_defCoach;
vector[n_offTeams] a_offTeam;
vector[n_defTeams] a_defTeam;
vector[n_matchups] a_matchup;
vector[n_halves] b_offFouls;
vector[n_halves] b_defFouls;
real b_offRecord;
real b_defRecord;
real b_offWinStreak;
real b_defWinStreak;
real b_offLoseStreak;
real b_defLoseStreak;
}
model{
vector[n_posTypes] p;
vector[n_posTypes] s = rep_vector(0, n_posTypes);

a ~ normal(0,1);
a_offCoach ~ normal(0,1);
a_defCoach ~ normal(0,1);
a_offTeam ~ normal(0,1);
a_defTeam ~ normal(0,1);
a_matchup ~ normal(0,1);
b_offFouls ~ normal(0,1);
b_defFouls ~ normal(0,1);
b_offRecord ~ normal(0,1);
b_defRecord ~ normal(0,1);
b_offWinStreak ~ normal(0,1);
b_defWinStreak ~ normal(0,1);
b_offLoseStreak ~ normal(0,1);
b_defLoseStreak ~ normal(0,1);

s[1:(n_posTypes - 1)] = weight .* (a +
a_offCoach[offCoach] + // line 63
a_defCoach[defCoach] +
a_offTeam[offTeam] +
a_defTeam[defTeam] +
a_matchup[matchup] +
b_offFouls[half] .* offFouls +
b_defFouls[half] .* defFouls +
b_offRecord * offRecord +
b_defRecord * defRecord +
b_offWinStreak * offWinStreak +
b_defWinStreak * defWinStreak +
b_offLoseStreak * offLoseStreak +
b_defLoseStreak * defLoseStreak);

p = softmax( s);
posType ~ categorical(p);
}
``````

and here’s the `str()` of the data:

`````` \$ n_pos        : int 3919
\$ n_halves     : num 2
\$ n_offTeams   : int 132
\$ n_defTeams   : int 143
\$ n_matchups   : int 734
\$ n_offCoaches : int 17
\$ n_defCoaches : int 17
\$ n_posTypes   : int 25
\$ half         : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
\$ posType      : Factor w/ 25 levels "NoDefFoul_NoOffFoul_NoOffTechFlag_0DefPts_0OffPts",..: 1 1 1 3 1 3 1 1 2 5 ...
\$ offCoach     : Factor w/ 17 levels "AmirJohnson",..: 7 10 7 10 7 10 7 7 7 10 ...
\$ defCoach     : Factor w/ 17 levels "AmirJohnson",..: 10 7 10 7 10 7 10 10 10 7 ...
\$ offTeam      : Factor w/ 132 levels "AkilMitchellTJClineGlenRiceJr",..: 102 84 102 84 102 84 102 102 102 84 ...
\$ defTeam      : Factor w/ 143 levels "AkilMitchellTJClineGlenRiceJr",..: 91 109 91 109 91 109 91 91 91 112 ...
\$ matchup      : Factor w/ 734 levels "OFFAkilMitchellTJClineGlenRiceJr_VS_DEFChrisJohnsonJohathonSimmonsCharlesGarcia",..: 628 556 628 556 628 556 628 628 628 557 ...
\$ weight       : num [1:3919] 1 1 1 1 1 ...
\$ offFouls     : num [1:3919] -1.79 -1.79 -1.79 -1.79 -1.79 ...
\$ defFouls     : num [1:3919] -1.79 -1.79 -1.79 -1.79 -1.79 ...
\$ offRecord    : num [1:3919] 0 0 0 0 0 0 0 0 0 0 ...
\$ defRecord    : num [1:3919] 0 0 0 0 0 0 0 0 0 0 ...
\$ offWinStreak : num [1:3919] 0 0 0 0 0 0 0 0 0 0 ...
\$ offLoseStreak: num [1:3919] 0 0 0 0 0 0 0 0 0 0 ...
\$ defWinStreak : num [1:3919] 0 0 0 0 0 0 0 0 0 0 ...
\$ defLoseStreak: num [1:3919] 0 0 0 0 0 0 0 0 0 0 ...
``````

Running as is, I get the following error:

``````Chain 1 Unrecoverable error evaluating the log probability at the initial value.
Chain 1 Exception: add: m1 ((24, 1)3919, 1) must match in size (in [...], line 63, column 2 to line 76, column 70)
``````

I think there is something wrong with the dimenions of my parameters, but changing them to something like `matrix[n_posType, n_offCoach] offCoach` hasn’t helped. Relatedly, if the problem is the dimensions of the parameters, I’m not sure using the `.*` operator will be appropriate once corrected.

Any help greatly appreciated. Many thanks!

Hello. Yes, there is dimension mismatch: on the right side, `a_offCoach[offCoach]` is the same length as offCoach, so length = n_pos = 3919 while on the left side `s[1:(n_posTypes - 1)]` has length = n_posTypes - 1 = 25 - 1.

Can you explain the various pieces of the data, the parameters and the formula for the log odds `s`?

Great!

Generally, I am trying to model the possible outcomes of a possession in the course of a sports match.
There are `n_posTypes` possible outcomes.

There are 2 types of predictor variables here: categorical and quantitative.

Among categorical predictor variables is the coach of the offensive team (`offCoach`), of which there are `n_offCoaches` many. Similarly, other categorical predictors are the defensive coach (`defCoach`), offensive team (`offTeam`), defensive team (`defTeam`), etc–each with `n_...` levels. With these types of predictors, I’m hoping to estimate an intercept for each level of each variable–pooled across all levels.

As for the quantitative predictors, like offensive record (`offRecord`), which I want slope estimates; these estimates are not pooled across any levels of anything. There are also `b_offFouls` and `b_defFouls`, which are slopes but for which I want separate (but pooled) estimates for the 2 halves of the match.

Finally, the curious `weight` at the beginning: this is variable comprised of team scores during a given possession; essentially, it’s meant to interact with everything.

Is this enough information?

1 Like

Yes, the information helps though I admit I’m not sure how this model is going to work out.

One challenge is that it seems you have a regression for each possession play, not a regression for the nPosTypes possible outcomes.

First consider what happens if there is a single observation. On the right side of the formula for `s`, we have `a` (vector parameter with length n_posTypes-1) plus a bunch of scalar parameters that add up to say `theta`. With multiple observations, the vector `a` is the same but the scalar `theta` varies across possessions. So this means that `s` varies across possessions as well.

So before you start to optimize your model, perhaps it will help to write the likelihood more explicitly:

``````for (n in 1:n_pos) {
real theta = a_offCoach[offCoach[n]] + ...;
s[1:(n_posTypes - 1)] = weight[n] * (a + theta);
posType[n] ~ categorical_logit(s);
}
``````