Multinominal Regression Priors

Hi,

I already got help by this forum for this problem: Categorical Regression

Luckily, with some help by @martinmodrak and @simonbrauer I got the model running but I’m still working on this. This was the latest version:

Now I was doing some prior predictive analysis and I’m not happy with the priors I’ve choosen:
image

This is the code to run the prior predictive simulation:

a = np.random.normal(loc=0, scale=1, size=12000)
b = np.random.normal(loc=0, scale=5, size=12000)
a = a.reshape([3,4000])
b = b.reshape([3,4000])
s1 = a[0,:]+ b[0,: ]
s2 = a[1,:]+ b[1,: ]
s3 = a[2,:]+ b[2,: ]
s4= 0
simulations = np.empty(shape=(len(s1),4))
for i in range(len(s1)):
    s = (s1[i],s2[i],s3[i],s4);
    u = special.softmax(s);
    simulations[i,:] = u
%matplotlib notebook
plt.figure()
ax = plt.subplot(111)
plt.violinplot(
            simulations,
            showmeans=False,
            showmedians=True
)
ax.set_title("Prior Predictive Plot")
ax.set_ylabel("Probability")
ax.set_xticklabels(["","Pos.","","Neg.","","Hydr.","","Oth."])
ax.set_ylim(-0.1,1.1)

I look for a mean and standard deviation so that my priors approach more a uniform distribution, but I’m not able to find any. The best I found so far is:

a = np.random.normal(loc=0, scale=1, size=12000)
b = np.random.normal(loc=0, scale=1.5, size=12000)

That gave me this:
image

Any ideas, how I can get a more uniform distribution?

Best,
Thorsten

1 Like

By more uniform, do you mean within groups or across groups? (By “groups”, I mean the variable on the x-axis of your plot.) Put another way, do you mean that you would like each group to have a more uniform distribution between 0 and 1? Or do you mean that you would like your constrained group (4) to be more similar to the other 3?

I’m not sure that it is possible to get a truly-uniform distribution within groups–though maybe someone with more of a background in probability theory will correct me! The uniformity is conditional on the number of groups. From a practical perspective, a uniform distribution for each group would indicate that there is a 50% probability that any group is about 50%, which is not possible with more than 2 groups.

If instead you’re concerned about group 4 looking different from the others, the easiest approach would be to estimate the coefficients for the fourth group rather than constrain it to 0. You will just need to make sure that the priors on a and b identify the model.

2 Likes

Hi Simon,

thanks for your answer!

What I meant was that each group has a more uniform distribution between 0 and 1.
My prior knowledge would be that the probability for each group can be anywhere between 0 and 1.
Of course, in a concrete example this is not possible, as they the probabilities has to sum up to one.

But now as I think about it again, it makes sense what you say. As a high probability values for one group, means lower probabilities for the other groups, it makes sense that lower probabilitie values need to be observed more often, and therefore a uniform distribution is impossible.

Regarding group 4:
My understanding of this kind of regression was that I always need to choose a pivot element and set it 0 (like in code 11.59 in the Book “Statistical Rethinking” of Richard McElreath)?

Best,
Thorsten

The pivot/reference group approach is necessary if you don’t use priors or regularizing constraints, such as with OLS. I think this is typically called a “hard constraint” in the Stan community. In a Bayesian context, you can use priors to estimate all group means simultaneously (a “soft constraint”). The model is identified because the group mean priors put a penalty on arbitrarily large or small values, so the model favors estimates that are closer to the priors. You can see a basic example of this below.

Hard constraint

data{
  int N;
  int M; // Number of groups
  
  int<lower = 1, upper = M> Y[N];
}
parameters{
  real alpha_0;
  vector[M-1] beta_raw;
}
transformed parameters{
  vector[M] alpha_g = rep_vector(alpha_0, M);
  vector[M] theta_g;
  alpha_g[1:(M-1)] += beta_raw;
  theta_g = softmax(alpha_g);
}
model{
  Y ~ categorical(theta_g);
}

Soft constraint

data{
  int N;
  int M; // Number of groups
  
  int<lower = 1, upper = M> Y[N];
}
parameters{
  real alpha_0;
  vector[M] alpha_g;
  real<lower = 0> gamma;
}
transformed parameters{
  vector[M] theta_g = softmax(alpha_g);
}
model{
  alpha_g ~ normal(alpha_0, gamma);
  gamma ~ normal(0, 1);
  Y ~ categorical(theta_g);
}