I’d suggest something like
-
simple mixtures (one discrete parameter per data point)
-
change point (one discrete parameter with 200 possible values)
-
HMM decoding (lots of linked discrete parameters); that is, set up training data (x, y), use it to estimate parameters p(theta | x, y), then give it some new x’ and let it predict new y’.
-
Cormack-Jolly-Seber mark-recapture : the animals live/death state is discrete and marginalized out in the Stan example
-
Dawid-Skene model – discrete parameter per item (not quit per data point), marginalized out in the Stan model