Multinomial with non-integer data

Suppose we wanted to implement this in Stan:

But multinomial_lpmf(int[] y, vector theta) doesn’t allow non-integer data. What do you recommend?


vector[D] theta[N];
vector[D] y_real[N];
for(i in 1:N)
  target += sum(y_real[i] .* (theta[i] - log_sum_exp(theta[i])));

where N number of observations, D dimension of simplex, y_real …

But the paper talks about binomial.

Updated. Added sum(). Should work without too.
same: target += sum(y_real[i] .* log_softmax(theta[i])));

1 Like

thanks so much!! the log-likelihood for the Multinomial is a bit different from your expression, I think? See e.g. p.272 of Agresti’s Categorical Data Analysis:

That’s just the J - 1 expression. Look at the PMF given in


p_1 + ... + p_k =1, the softmax.
log(p_i) = theta[i] - log\_sum\_exp(theta[i]))

The factor with n! / x1! … /x_k! can be omitted for calc. the likelihood, it
contains only constants.

For identifiable reasons your theta’s should sum-to-zero or contain one element being 0,
the reference element. Ref. Stan manual.

1 Like

thanks so much!! got it.

(I didn’t catch at first that you switched theta from probabilities to logit scale)