Can a multinomial regression be condensed?

Nicklaus_Millican · March 22, 2024, 6:39pm

I have a toy model here from McElreath’s book; here it is:

data{  
  int N;// number of individuals  
  int K;// number of possible careers  
  int career[N];// outcome  
  vector[K] career_income;  
}  
parameters{  
  vector[K-1] a;// intercepts  
  real<lower=0> b;// association of income with choice  
}  
model{  
  vector[K] p;  
  vector[K] s;  
  a ~ normal( 0, 1);  
  b ~ normal( 0, 0.5);  
  s[1] = a[1] + b*career_income[1];  
  s[2] = a[2] + b*career_income[2];  
  s[3] = 0;// pivot  
  p = softmax( s);  
career ~ categorical( p);  
}

I am hoping to build a more complex model, with K=100+. Is there a way to condense the notation of such a model, rather than having to write a linear model for each of the K-1 categories?

I was imagining something like

s[index_over_K_outcomes] = a[index_over_K_outcomes] + b*career_income[index_over_K_outcomes];

For that matter, perhaps there is an indexing trick that would allow the same linear model to be ran on multiple outcomes. For instance, regressing each K categories as a Poisson on the same linear model.

Many thanks!

Ven_Popov · March 23, 2024, 11:41am

This should work:

data{  
  int N;// number of individuals  
  int K;// number of possible careers  
  array[N] int career;// outcome  
  vector[K-1] career_income;  
}  
parameters{  
  vector[K-1] a;// intercepts  
  real<lower=0> b;// association of income with choice  
}
model{  
  vector[K] p;  
  vector[K] s = rep_vector(0, K);
  s[1:(K-1)] = a + b*career_income;
  a ~ normal( 0, 1);  
  b ~ normal( 0, 0.5);  
  career ~ categorical_logit(s);  
}

Just ensure that you standardize your career_income variable and that the last value is 0.

desislava · March 23, 2024, 3:30pm

As there are no individual-level predictors, the parameter s is the same for all individuals. What matters is how many individuals have chosen each career (the observed counts per possible career are a sufficient statistic). So in addition to the improvement suggested by @Ven_Popov, you can factorize the categorical likelihood into K components, one for each career:

data {
  int N; // number of individuals
  int K; // number of possible careers
  vector[K-1] career_income; // how come one career has no income?
  array[N] int career; // outcome
}
transformed data {
  array[K] int counts = rep_array(0, K);
  for (i in 1:N) {
    counts[career[i]] += 1;
  }
}
parameters {
  vector[K-1] a; // intercepts
  real<lower=0> b; // association of income with choice
}
model {
  vector[K] s = rep_vector(0, K);
  s[1:(K-1)] = a + b * career_income;
  a ~ normal(0, 1);
  b ~ normal(0, 0.5);

  for (k in 1:K) {
    target += counts[k] * categorical_logit_lpmf(k | s);
  }
}

I think the sampling would be more efficient if the number of individuals N is large compared to the number of careers K.

You can also use the multinomial likelihood: counts ~ multinomial(softmax(s)) without any loop. (The multinomial pmf includes the multinomial coefficient which is a constant in the parameter s.) There is no multinomial_logit equivalent though.

nhuurre · March 25, 2024, 6:51am

There is:
https://mc-stan.org/docs/functions-reference/multivariate_discrete_distributions.html#multinomial-distribution-logit-parameterization

Nicklaus_Millican · March 31, 2024, 6:18pm

I though I could work through this on a more complicated model, but I’m still struggling. Here’s my model as currently is:

data{
  int n_pos;
  int n_halves;
  int n_offTeams;
  int n_defTeams;
  int n_matchups;
  int n_offCoaches;
  int n_defCoaches;
  int n_posTypes;
  int<lower=1, upper=n_halves> half[n_pos];
  int<lower=1, upper=n_posTypes> posType[n_pos];
  int<lower=1, upper=n_offCoaches> offCoach[n_pos];
  int<lower=1, upper=n_defCoaches> defCoach[n_pos];
  int<lower=1, upper=n_offTeams> offTeam[n_pos];
  int<lower=1, upper=n_defTeams> defTeam[n_pos];
  int<lower=1, upper=n_matchups> matchup[n_pos];
  vector[n_pos] weight;
  vector[n_pos] offFouls;
  vector[n_pos] defFouls;
  vector[n_pos] offRecord;
  vector[n_pos] defRecord;
  vector[n_pos] offWinStreak;
  vector[n_pos] offLoseStreak;
  vector[n_pos] defWinStreak;
  vector[n_pos] defLoseStreak;
}
parameters{
  vector[n_posTypes-1] a;
  vector[n_offCoaches] a_offCoach;
  vector[n_defCoaches] a_defCoach;
  vector[n_offTeams] a_offTeam;
  vector[n_defTeams] a_defTeam;
  vector[n_matchups] a_matchup;
  vector[n_halves] b_offFouls;
  vector[n_halves] b_defFouls;
  real b_offRecord;
  real b_defRecord;
  real b_offWinStreak;
  real b_defWinStreak;
  real b_offLoseStreak;
  real b_defLoseStreak;
}
model{
  vector[n_posTypes] p;
  vector[n_posTypes] s = rep_vector(0, n_posTypes);
  
  a ~ normal(0,1);
  a_offCoach ~ normal(0,1);
  a_defCoach ~ normal(0,1);
  a_offTeam ~ normal(0,1);
  a_defTeam ~ normal(0,1);
  a_matchup ~ normal(0,1);
  b_offFouls ~ normal(0,1);
  b_defFouls ~ normal(0,1);
  b_offRecord ~ normal(0,1);
  b_defRecord ~ normal(0,1);
  b_offWinStreak ~ normal(0,1);
  b_defWinStreak ~ normal(0,1);
  b_offLoseStreak ~ normal(0,1);
  b_defLoseStreak ~ normal(0,1);

  s[1:(n_posTypes - 1)] = weight .* (a + 
                                     a_offCoach[offCoach] + // line 63
                                     a_defCoach[defCoach] +
                                     a_offTeam[offTeam] +
                                     a_defTeam[defTeam] +
                                     a_matchup[matchup] +
                                     b_offFouls[half] .* offFouls +
                                     b_defFouls[half] .* defFouls +
                                     b_offRecord * offRecord +
                                     b_defRecord * defRecord +
                                     b_offWinStreak * offWinStreak +
                                     b_defWinStreak * defWinStreak +
                                     b_offLoseStreak * offLoseStreak +
                                     b_defLoseStreak * defLoseStreak);
  
  p = softmax( s);  
  posType ~ categorical(p);
}

and here’s the str() of the data:

 $ n_pos        : int 3919
 $ n_halves     : num 2
 $ n_offTeams   : int 132
 $ n_defTeams   : int 143
 $ n_matchups   : int 734
 $ n_offCoaches : int 17
 $ n_defCoaches : int 17
 $ n_posTypes   : int 25
 $ half         : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
 $ posType      : Factor w/ 25 levels "NoDefFoul_NoOffFoul_NoOffTechFlag_0DefPts_0OffPts",..: 1 1 1 3 1 3 1 1 2 5 ...
 $ offCoach     : Factor w/ 17 levels "AmirJohnson",..: 7 10 7 10 7 10 7 7 7 10 ...
 $ defCoach     : Factor w/ 17 levels "AmirJohnson",..: 10 7 10 7 10 7 10 10 10 7 ...
 $ offTeam      : Factor w/ 132 levels "AkilMitchellTJClineGlenRiceJr",..: 102 84 102 84 102 84 102 102 102 84 ...
 $ defTeam      : Factor w/ 143 levels "AkilMitchellTJClineGlenRiceJr",..: 91 109 91 109 91 109 91 91 91 112 ...
 $ matchup      : Factor w/ 734 levels "OFFAkilMitchellTJClineGlenRiceJr_VS_DEFChrisJohnsonJohathonSimmonsCharlesGarcia",..: 628 556 628 556 628 556 628 628 628 557 ...
 $ weight       : num [1:3919] 1 1 1 1 1 ...
 $ offFouls     : num [1:3919] -1.79 -1.79 -1.79 -1.79 -1.79 ...
 $ defFouls     : num [1:3919] -1.79 -1.79 -1.79 -1.79 -1.79 ...
 $ offRecord    : num [1:3919] 0 0 0 0 0 0 0 0 0 0 ...
 $ defRecord    : num [1:3919] 0 0 0 0 0 0 0 0 0 0 ...
 $ offWinStreak : num [1:3919] 0 0 0 0 0 0 0 0 0 0 ...
 $ offLoseStreak: num [1:3919] 0 0 0 0 0 0 0 0 0 0 ...
 $ defWinStreak : num [1:3919] 0 0 0 0 0 0 0 0 0 0 ...
 $ defLoseStreak: num [1:3919] 0 0 0 0 0 0 0 0 0 0 ...

Running as is, I get the following error:

Chain 1 Unrecoverable error evaluating the log probability at the initial value. 
Chain 1 Exception: add: m1 ((24, 1)3919, 1) must match in size (in [...], line 63, column 2 to line 76, column 70)

I think there is something wrong with the dimenions of my parameters, but changing them to something like matrix[n_posType, n_offCoach] offCoach hasn’t helped. Relatedly, if the problem is the dimensions of the parameters, I’m not sure using the .* operator will be appropriate once corrected.

Any help greatly appreciated. Many thanks!

desislava · March 31, 2024, 8:27pm

Hello. Yes, there is dimension mismatch: on the right side, a_offCoach[offCoach] is the same length as offCoach, so length = n_pos = 3919 while on the left side s[1:(n_posTypes - 1)] has length = n_posTypes - 1 = 25 - 1.

Can you explain the various pieces of the data, the parameters and the formula for the log odds s?

Nicklaus_Millican · March 31, 2024, 8:52pm

Great!

Generally, I am trying to model the possible outcomes of a possession in the course of a sports match.
There are n_posTypes possible outcomes.

There are 2 types of predictor variables here: categorical and quantitative.

Among categorical predictor variables is the coach of the offensive team (offCoach), of which there are n_offCoaches many. Similarly, other categorical predictors are the defensive coach (defCoach), offensive team (offTeam), defensive team (defTeam), etc–each with n_... levels. With these types of predictors, I’m hoping to estimate an intercept for each level of each variable–pooled across all levels.

As for the quantitative predictors, like offensive record (offRecord), which I want slope estimates; these estimates are not pooled across any levels of anything. There are also b_offFouls and b_defFouls, which are slopes but for which I want separate (but pooled) estimates for the 2 halves of the match.

Finally, the curious weight at the beginning: this is variable comprised of team scores during a given possession; essentially, it’s meant to interact with everything.

Is this enough information?

desislava · March 31, 2024, 10:45pm

Yes, the information helps though I admit I’m not sure how this model is going to work out.

One challenge is that it seems you have a regression for each possession play, not a regression for the nPosTypes possible outcomes.

First consider what happens if there is a single observation. On the right side of the formula for s, we have a (vector parameter with length n_posTypes-1) plus a bunch of scalar parameters that add up to say theta. With multiple observations, the vector a is the same but the scalar theta varies across possessions. So this means that s varies across possessions as well.

So before you start to optimize your model, perhaps it will help to write the likelihood more explicitly:

for (n in 1:n_pos) {
    real theta = a_offCoach[offCoach[n]] + ...;
    s[1:(n_posTypes - 1)] = weight[n] * (a + theta);
    posType[n] ~ categorical_logit(s);
}

Topic		Replies	Views
Multinomial regression - (trivial?) error I cannot understand Modeling	2	335	January 23, 2021
Question about output from multinomial logistic regression Modeling	5	354	October 19, 2022
Softmax by hand Modeling	1	1268	March 10, 2018
Multinomial distribution proportionate to inverse of the inputs Modeling specification	2	324	September 14, 2021
Request for Multinomial example Modeling	8	3614	August 6, 2019

Can a multinomial regression be condensed?

Related Topics