How to compute P-value for Mixture Model

Hello! I’m fitting the data using two component mixture model π’šπ’Š ~ π‘ΎπŸŽπ’Šπ’‘(.|π€πŸŽπ’Š)+ π‘ΎπŸπ’Šπ’‘(.|π€πŸπ’Š). Where π‘Š0𝑖 and π‘Š1𝑖 represent the mixing probability, and 𝒑(.|𝝀) represent the Poisson distribution. After I pass through some processes using Stan package, I got like the following output:

yi W1i
5 0.4
2 0.7
10 0.6
2 0.4

Finally, I define the new latent variable 𝑍𝑖, 𝑖 = 1,…,𝑛 that indicates the category of observation group, i.e., whether it is in the first or second category. The indicator variable has two outcomes (0 and 1), and it follows Bernoulli distribution, 𝑍𝑖~π΅π‘’π‘Ÿπ‘›π‘œπ‘’π‘™π‘™π‘–(π‘Š1𝑖), for 𝑖 = 1, 2,…, 𝑛 and it is concluded that the observation 𝑖 is in the second group (I call it significant observations) whenever 𝑃(𝑍𝑖 = 1|π‘Œ) is bigger than a cutoff value, say 0.5.

my question is that:

is it need to use statistical significance or FDR to select significant observations, instead of using one ad hoc number (cutoff of the posterior probability of π‘Š1𝑖 > 0.5)?

Hi yab!

I’m not really sure, but I would say β€œit depends”. In a Bayesian approach you don’t β€œneed” statistical significance thresholds. What constitutes a significant observations should IMO come from your domain expertise. If the notion is β€œit’s a significant observation when it’s more likely to be in category 1 than category 0”, then the W1i > 0.5 threshold makes sense. However, it could be a significant observation if it almost surely falls into category 1 and then you’d probably want to go for something like W1i > 0.95 or something along those lines. But that’s more of a decision (as in decision theory) than an estimation issue I would say.

I hope this was at least a bit helpful. Maybe others have more/different ideas…


1 Like