To keep it simple, suppose you have count data in 4 categories, y_t \in \mathbb{N}^4. And now suppose for time t you somehow estimate a simplex \theta_t \in \Delta^3 (i.e., \theta \in [0, \infty)^4 and \textrm{sum}(\theta) = 1) and the complete data model is

y_t \sim \textrm{categorical}(\theta_t).

Now suppose that y_t (for some t) is missing column 2, e.g., y_t = [17 \ \texttt{NA} \ 12 \ 2]. What you can do is project the simplex \theta_t down to 3 dimensions,

\phi_t = \frac{[\theta_{t, 1} \ \theta_{t,3} \ \theta_{t,4}]^{\top}}{\theta_{t,1} + \theta_{t,3} + \theta_{t,4}},

and use

[17 \ 12 \ 2] \sim \textrm{categorical}(\phi_t)

We know that this is the right marginal distribution for only observing y_{t, 1}, y_{t, 3}, y_{t, 4}.

Then if you want to impute the actual values, you have multiple options. Presumably we don’t know T_t, the total number of observations at time t. If we did, then we could just sample the unknowns (with just one, it’d be deterministic), or we could draw a Poisson with some uncertainty.

You can use this no matter how \theta_t was derived—by a simple simplex parameter or by a multi-logic regression. It still works the same way.