Multinominal Regression Priors

thowagner · July 20, 2021, 8:35am

Hi,

I already got help by this forum for this problem: Categorical Regression

Luckily, with some help by @martinmodrak and @simonbrauer I got the model running but I’m still working on this. This was the latest version:

gist.github.com

https://gist.github.com/thorstenwagner/4b69a797b8c1508eb2ee07dc9763f9a4

catreg.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 154,
   "metadata": {},
   "outputs": [],
   "source": [
    "library(rethinking)"
   ]

This file has been truncated. show original

Now I was doing some prior predictive analysis and I’m not happy with the priors I’ve choosen:

This is the code to run the prior predictive simulation:

a = np.random.normal(loc=0, scale=1, size=12000)
b = np.random.normal(loc=0, scale=5, size=12000)
a = a.reshape([3,4000])
b = b.reshape([3,4000])
s1 = a[0,:]+ b[0,: ]
s2 = a[1,:]+ b[1,: ]
s3 = a[2,:]+ b[2,: ]
s4= 0
simulations = np.empty(shape=(len(s1),4))
for i in range(len(s1)):
    s = (s1[i],s2[i],s3[i],s4);
    u = special.softmax(s);
    simulations[i,:] = u
%matplotlib notebook
plt.figure()
ax = plt.subplot(111)
plt.violinplot(
            simulations,
            showmeans=False,
            showmedians=True
)
ax.set_title("Prior Predictive Plot")
ax.set_ylabel("Probability")
ax.set_xticklabels(["","Pos.","","Neg.","","Hydr.","","Oth."])
ax.set_ylim(-0.1,1.1)

I look for a mean and standard deviation so that my priors approach more a uniform distribution, but I’m not able to find any. The best I found so far is:

a = np.random.normal(loc=0, scale=1, size=12000)
b = np.random.normal(loc=0, scale=1.5, size=12000)

That gave me this:

Any ideas, how I can get a more uniform distribution?

Best,
Thorsten

simonbrauer · July 20, 2021, 1:31pm

By more uniform, do you mean within groups or across groups? (By “groups”, I mean the variable on the x-axis of your plot.) Put another way, do you mean that you would like each group to have a more uniform distribution between 0 and 1? Or do you mean that you would like your constrained group (4) to be more similar to the other 3?

I’m not sure that it is possible to get a truly-uniform distribution within groups–though maybe someone with more of a background in probability theory will correct me! The uniformity is conditional on the number of groups. From a practical perspective, a uniform distribution for each group would indicate that there is a 50% probability that any group is about 50%, which is not possible with more than 2 groups.

If instead you’re concerned about group 4 looking different from the others, the easiest approach would be to estimate the coefficients for the fourth group rather than constrain it to 0. You will just need to make sure that the priors on a and b identify the model.

thowagner · July 20, 2021, 1:45pm

Hi Simon,

thanks for your answer!

What I meant was that each group has a more uniform distribution between 0 and 1.
My prior knowledge would be that the probability for each group can be anywhere between 0 and 1.
Of course, in a concrete example this is not possible, as they the probabilities has to sum up to one.

But now as I think about it again, it makes sense what you say. As a high probability values for one group, means lower probabilities for the other groups, it makes sense that lower probabilitie values need to be observed more often, and therefore a uniform distribution is impossible.

Regarding group 4:
My understanding of this kind of regression was that I always need to choose a pivot element and set it 0 (like in code 11.59 in the Book “Statistical Rethinking” of Richard McElreath)?

Best,
Thorsten

simonbrauer · July 20, 2021, 3:14pm

The pivot/reference group approach is necessary if you don’t use priors or regularizing constraints, such as with OLS. I think this is typically called a “hard constraint” in the Stan community. In a Bayesian context, you can use priors to estimate all group means simultaneously (a “soft constraint”). The model is identified because the group mean priors put a penalty on arbitrarily large or small values, so the model favors estimates that are closer to the priors. You can see a basic example of this below.

Hard constraint

data{
  int N;
  int M; // Number of groups
  
  int<lower = 1, upper = M> Y[N];
}
parameters{
  real alpha_0;
  vector[M-1] beta_raw;
}
transformed parameters{
  vector[M] alpha_g = rep_vector(alpha_0, M);
  vector[M] theta_g;
  alpha_g[1:(M-1)] += beta_raw;
  theta_g = softmax(alpha_g);
}
model{
  Y ~ categorical(theta_g);
}

Soft constraint

data{
  int N;
  int M; // Number of groups
  
  int<lower = 1, upper = M> Y[N];
}
parameters{
  real alpha_0;
  vector[M] alpha_g;
  real<lower = 0> gamma;
}
transformed parameters{
  vector[M] theta_g = softmax(alpha_g);
}
model{
  alpha_g ~ normal(alpha_0, gamma);
  gamma ~ normal(0, 1);
  Y ~ categorical(theta_g);
}

Topic		Replies	Views
Priors for multinomial logistic regression in brms brms prior-choice , priors , multinomial-response , brms	5	221	February 13, 2025
Multinomial Model - How to set Prior for categorical variables? Modeling techniques	3	1256	February 18, 2020
Setting priors for categorical models brms	2	840	May 7, 2020
Posterior predictive checking for multinomial logit Modeling	3	819	November 10, 2020
Can you use a horseshoe prior with a categorical model? brms techniques , multinomial-response	2	978	June 22, 2020

Multinominal Regression Priors

Related topics