Michael and Ben, sorry for the delayed response.

First, let me try to answer Ben’s question. I’m trying to model what Judea Pearl in his book *Causality* describes as bounded estimation of treatment effects (chapter 8). Basically, each observation has a vector of latent types. In my simple model here, there are two type variables r_d and r_m, and each is a discrete random variable. For example, in my model,

r_d \in \{\textrm{offered-on-assignment, never-offered}\} \\
r_m \in \{\textrm{never-migrate, complier-migrate, always-migrate}\}.

There are three binary observed variables in this model (Z, D, M), where Z is a random assignment in an experiment. From the observed data, I usually cannot determine exactly what types each observation belongs to. For example, if an observation is assigned to control, Z = 0, and I observed D = 0 I can never determine what r_d is for that observation, but if D = 1, I know for sure r_d = \textrm{offered-on-assignment}. And similarly, for M. I want to calculate the posterior distribution of the joint p(r_d, r_m) so I can calculate the posterior distribution of the bounds of causal estimands such as E(M_{d=1}) (here the subscript indicates an intervention such that everyone is forced D=1). Another complication is that these types are likely correlated (endogenous). So, for each observation, I have a mixture of possible type combinations and that’s all I have to update the joint probability. There is going to be an infinite number of joint distributions that conform with the data, but I want to know what this distribution of the joint distribution looks like and how it bounds my estimands.

Michael, maybe I don’t correctly understand, but it sounds like the marginal Beta distribution that we get from the Dirichlet distribution that you mentioned for some x_k would the marginal probability of any particular pair of types of (r_d, r_m). So the Beta distribution that you mentioned would be on the probability, for example, that (r_d = \textrm{never-offered}, r_m = \textrm{never-migrate}). What my prior knowledge is actually on is the relative proportions of types within each type variable: so I have a generate senses that p(r_d = \textrm{never-offered}) should be small and that p(r_m = \textrm{never-migrate}) should be higher than the other r_m types.

What I have been doing is tweaking the joint distribution parameters and then using prior prediction to look at the marginal probabilities p(r_d) and p(r_m) to see if they conform with what I want to model. I just wasn’t sure that my raising and lowering parameters for the joint probability isn’t messing things up in ways I don’t see when I just look at prior predicted marginal probabilities. What I thought would be better was modeling the priors p(r_d) and p(r_m | r_d) separately. This would allow the model to vary the probabilities of r_m conditional on r_d, but when I’m setting the prior I just use the same prior for all p(r_m | r_d). I.e., I set the probability for p(r_m = \textrm{never-migrate} | r_d = \textrm{offered-on-assignment}) and p(r_m = \textrm{never-migrate} | r_d = \textrm{never-offered}) equally high.