Multinomial with non-integer data

Suppose we wanted to implement this in Stan:

But multinomial_lpmf(int[] y, vector theta) doesn’t allow non-integer data. What do you recommend?


vector[D] theta[N];
vector[D] y_real[N];
for(i in 1:N)
  target += sum(y_real[i] .* (theta[i] - log_sum_exp(theta[i])));

where N number of observations, D dimension of simplex, y_real …

But the paper talks about binomial.

Updated. Added sum(). Should work without too.
same: target += sum(y_real[i] .* log_softmax(theta[i])));

thanks so much!! the log-likelihood for the Multinomial is a bit different from your expression, I think? See e.g. p.272 of Agresti’s Categorical Data Analysis:

That’s just the J - 1 expression. Look at the PMF given in


p_1 + ... + p_k =1, the softmax.
log(p_i) = theta[i] - log\_sum\_exp(theta[i]))

The factor with n! / x1! … /x_k! can be omitted for calc. the likelihood, it
contains only constants.

For identifiable reasons your theta’s should sum-to-zero or contain one element being 0,
the reference element. Ref. Stan manual.

thanks so much!! got it.

(I didn’t catch at first that you switched theta from probabilities to logit scale)