How to write a finite discrete mixture model in stan

Hi folks,

I’m trying to translate a field-specific theory (usually implemented in excel, of all places…) into Stan to take advantage of the quantification of uncertainty that goes along with Bayes. I’m having some trouble wrapping my head around how best to go about it, and so would love some feedback / pointers towards a minimal working example, please.

The general setup is this:

I’m modelling experimentally-obtained counts of binary data (ex., did something happen on a trial or not - heads or tails), which the theory specifies as coming from one of two processes, which share partially-overlapping predictors. One of these processes is a binomial outcome (call it Option 1) with some set of predictors (say, A, B, and C); the other is a four-category categorical outcome with a partially-overlapping set of predictors (say, B, D, and E). However, we only observe some aspects of the outcome of the four-category outcome, such that two of them look identical yielding what is effectively two observable outcomes (heads or tails) with four different distinct sources. We know which of the datapoints was generated the binomial, simpler model, and which one was generated from the four-category model that was then collapsed to a binary outcome. We don’t know, however, which of the four outcomes was used on any given trial, just that if we saw heads on a trial we know comes from the four-outcome distribution, it was one of the two ways of getting heads, and mutatis mutandis for tails.

Would someone be willing / able to provide a small working example for this? I’m particularly hung up on how exactly to implement the part where you have four distinct outputs from the categorical model but only enough information is known to narrow down to two choices.

Thanks very much for the time / generosity of anyone who wants to help me out - please let me know if this isn’t clear or if I can provide more info.

Usually the place to begin with any model is to write code to simulate fake data. That will also help folks understand precisely what you’re trying to do. Think you can do that and post here? Usually either python or R code or even pseudo-code would work (though the latter might prevent catching gotchas that the former two would through being able to actually execute and check the resulting data).

1 Like

Hi Mike,

Thanks a lot for your response. I’ve written some pseudocode (python-esque, since that’s the language I’m most fluent in). Hopefully this helps? It’s not strictly for generating fake data, but I think it’s basically the model structure / generative process I am going for / analogous to what is proposed in the theory - I’d just like to be able to condition it on data via the observed_outcome variable.


Let’s imagine flipping a coin on each trial, and we have two possible source coins (red coin and green coin, for ease of discussion). We know what the result of the flip was on each trial (heads or tails), and for some known set of trials, we know that the coin that was flipped was red. What we don’t know is on trials where we don’t know the coin color what it was, and also what the role of different predictors (here glossed over for simplicity’s sake, but basically just regression) are in conditioning different outcomes (outcomes being just the heads/tails status for when we know the color of the coin and it’s red, and the pair of (headedness,coincolor) for cases where we don’t know the color of the coin.

For trial in list_of_trials:
	If trial[coin_color] == “unknown”:
		Vector_of_predictors = []
		For option in [(heads, red), (heads, green), (tails, red), (tails,green)]:
			Option_predictors = get_predictors(option)
			Vector_of_predictors[option] = option_predictors
		Unobserved_outcome ~ categorical_logit(vector_of_predictors) 
		observed_outcome = unobserved_outcome(0) # selects just the first element – whether its heads or tails
	if trial[coin_color] == “red”:
		Vector_of_predictors = []
		For option in [(heads,red), (tails,red)]:
			Option_predictors = get_predictors(option)
            Vector_of_predictors[option] = option_predictors
		Observed_outcome ~ categorical_logit(Vector_of_predictors) # could be binomial too

The get_predictors function would just construct a linear combination of coefficients and features for each (headedness, color) pairing, which are all fully known.
What I’d like to be able to do is to do inference about, ideally at the trail level, what the posterior probability of coin color is for trials where the coin color isn’t known. I’d also like to be able to do examine posterior distributions and uncertainty around coefficient estimates for the predictors.

Does this match up to what you had in mind? Thanks again for being willing to engage and for taking the time with this, I appreciate it.


Edited: code errors fixed.

Any thoughts on this, @mike-lawrence ? (sorry to bug directly; unsure of etiquette here / if the reply notification is automatic…)

Reply notifications are automatic, but I don’t personally find it inappropriate to tag/remind after a couple days, so all good.

Hm, so when the coin is red, we see both that it is red and whether it was heads or tails, but when it’s green, all we see is that it’s green and not whether it came up as heads or tails? Or is it that you sometimes see a coin’s heads/tails-ness but can’t see it’s color, but know it can either be red or green?

1 Like

Awesome, ok thanks.

So in all cases you know what the headedness of the coin is, like what is observed, and you also know whether or not you are sure the coin is red. If you know the coin is red, it’s just logistic regression, but if you’re not sure what color the coin is, you need to figure out whether it was red or green, and the green coin has a potentially different rate of coming up heads.

Does that help any? Haha sorry to not be clearer about this.

Ah, ok, so you have a bunch of binomial outcomes (heads or tails), some of which have a known label as red, but others have an unknown label that could be either red or green. And you’re hoping that you can derive a probabilistic prediction of the label of a given unknown-label outcome.

Unless there is more structure to the data that I am missing, I don’t think you can do this. At first I was thinking that maybe if you for example had a scenario where known-reds were nearly always heads, a given tails in the unknowns might support a not-red label, but on further consideration realized that it is crucial to have some known-green data to inform the predictions for how unusual a tail is for green. It could be even more unusual for a green to come up tails.

Hi Mike,

Right - this is where the coin analogy breaks down, unfortunately. I do have a bunch of information about different coins (idk, maybe what mint coined them? what denomination they are? how heavy they are?) that makes certain outcomes much more likely to come from red coins than from green coins. Since all of the coins have values for these characteristics, I can do inference about the coefficients of these characteristics based on the known-red coins, and can “share” this information to allow probabilistic inferences about which color a coin might be, if it has a certain set of characteristics, and a certain outcome. For example, in this fictional world, let’s say that size is correlated with chance of coming up heads, among red coins. So if we have a red half-dollar coin (quite large) it’s very likely to come up heads. If we then observe a large coin come up tails whose color we don’t know, there’s a small chance that it’s red (since red coins tend to be more heads-y as they get larger), and if we don’t have any prior knowledge about how coin size is related to color among green coins, we can say that there’s about a .5 chance of the coin being heads if it’s green. So this kind of structure (I think) allows us to do some inference about what color the coins might be - also, there are a whole bunch more of these kind of silly size-predictor kind of things that I made up here, some of which are based on whether a coin is green or not. I managed to fit the model in Excel (it’s standard in my field), and with priors on the coefficients I was able to get the model to fit fine and give me reasonable answers / inferences about which color the coin was.

Does this help any?


Unless you have at least some data where you know the coin is green (and can thereby discern how the other properties differentiate red and green), I don’t see how inference is going to be anything other than propagation of the priors alone.

Hey Mike,

I see - I don’t know if I’ve entirely made my point clearly (which is on me - sorry!), so I think there is a way to identify which color is more likely for unknown coins. I think for right now it might be easiest just for me to try fitting a toy model with this structure and then just see if, when I change the priors, whether the posterior reflects anything but those priors, rather than trying to discuss the problem abstractly here. You very well could be right, but I think it might be quicker for me to find this out by trying it and watching it crash than trying to pick through the details verbally here.

In this case, basically the only part of the code I put above that I don’t have any idea where to start from is going from the unobserved four-category outcome down to the 2-category outcome. One way I thought about doing this conceptually is to give numerical codes to each of the headedness+color combinations, such that heads,red = 1, tails,red = 2, head,green = 3, tails,green = 4. That way I can construct the predictor for each of the four possibilities in the color-unknown case, and then use the mod-2 operator to basically reduce this down to whether the coin was heads (result mod-2 = 1) or tails (result mod-2 = 0). Does that seem feasible to you? If I were writing this with python I’d do something with strings as variable names, but that seems kinda clunky in stan / not sure how to do that.

In any case, I appreciate your taking the time to engage on this, I’ve found this conversation helpful.

1 Like

Hm, Stan definitely can’t do discrete latent parameters but maybe you can achieve what you want through a multinomial mixture?

Could be? Not sure exactly how that would work. Re: it being discrete, my thought was that this wouldn’t be a problem here, since it’s just taking the outcome of a categorical logit and conditioning only some aspect of that outcome on real data, so it would be more underdefined / running the risk of unidentifiability, than being discrete… but maybe I’m in over my head here.