Michael and Ben, sorry for the delayed response.
First, let me try to answer Ben’s question. I’m trying to model what Judea Pearl in his book Causality describes as bounded estimation of treatment effects (chapter 8). Basically, each observation has a vector of latent types. In my simple model here, there are two type variables r_d and r_m, and each is a discrete random variable. For example, in my model,
r_d \in \{\textrm{offered-on-assignment, never-offered}\} \\
r_m \in \{\textrm{never-migrate, complier-migrate, always-migrate}\}.
There are three binary observed variables in this model (Z, D, M), where Z is a random assignment in an experiment. From the observed data, I usually cannot determine exactly what types each observation belongs to. For example, if an observation is assigned to control, Z = 0, and I observed D = 0 I can never determine what r_d is for that observation, but if D = 1, I know for sure r_d = \textrm{offered-on-assignment}. And similarly, for M. I want to calculate the posterior distribution of the joint p(r_d, r_m) so I can calculate the posterior distribution of the bounds of causal estimands such as E(M_{d=1}) (here the subscript indicates an intervention such that everyone is forced D=1). Another complication is that these types are likely correlated (endogenous). So, for each observation, I have a mixture of possible type combinations and that’s all I have to update the joint probability. There is going to be an infinite number of joint distributions that conform with the data, but I want to know what this distribution of the joint distribution looks like and how it bounds my estimands.
Michael, maybe I don’t correctly understand, but it sounds like the marginal Beta distribution that we get from the Dirichlet distribution that you mentioned for some x_k would the marginal probability of any particular pair of types of (r_d, r_m). So the Beta distribution that you mentioned would be on the probability, for example, that (r_d = \textrm{never-offered}, r_m = \textrm{never-migrate}). What my prior knowledge is actually on is the relative proportions of types within each type variable: so I have a generate senses that p(r_d = \textrm{never-offered}) should be small and that p(r_m = \textrm{never-migrate}) should be higher than the other r_m types.
What I have been doing is tweaking the joint distribution parameters and then using prior prediction to look at the marginal probabilities p(r_d) and p(r_m) to see if they conform with what I want to model. I just wasn’t sure that my raising and lowering parameters for the joint probability isn’t messing things up in ways I don’t see when I just look at prior predicted marginal probabilities. What I thought would be better was modeling the priors p(r_d) and p(r_m | r_d) separately. This would allow the model to vary the probabilities of r_m conditional on r_d, but when I’m setting the prior I just use the same prior for all p(r_m | r_d). I.e., I set the probability for p(r_m = \textrm{never-migrate} | r_d = \textrm{offered-on-assignment}) and p(r_m = \textrm{never-migrate} | r_d = \textrm{never-offered}) equally high.