Modelling a three binary events: A, B and T. A and B are observed directly, whereas T is only observed as if A+B occured

Hi! This is a problem which I have already solved in Stan, but I would welcome a comment, if there is an analytic solution to that (most probably in the form of a Dirichlet distribution).

There are three events that take place: A, B and T. All are assumed independent of each other.
We observe the following quantities:
“n_np” - number of times we observed an event B and no event A nor T
“n_pn” - number of times we observed an event A and no event B nor T
“n_pp” - number of times we observed an event A, B or an event T
“n_nn” - number of times none of the events happened

I have solved the model this way:

data {
  int<lower=0> n_nn;
  int<lower=0> n_pn;
  int<lower=0> n_pp;
  int<lower=0> n_np;
}

transformed data {
  int nd[4];
  nd[1] = n_nn;
  nd[2] = n_pn;
  nd[3] = n_np;
  nd[4] = n_pp;
}

parameters {
  real<lower=0, upper=1> pA;
  real<lower=0, upper=1> pB;
  real<lower=0, upper=1> pT;
}

transformed parameters {
  simplex[4] s;
  s[1] = (1-pA) * (1-pB) * (1-pT);
  s[2] = pA * (1-pB)*(1-pT);
  s[3] = pB * (1-pA)*(1-pT);
  s[4] = pA * pB + pT - pA*pB*pT;
}

model {
  nd ~ multinomial(s);
}

The model passed all the sanity checks.

Looking at the model, I suspect there is an analytic solution in the form of a Dirichlet distribution. Do you agree?

If so, how would the problem’s constraints influence the final shape of it?

This question is a follow-up to the Modelling a union of two Bernoulli events - #5 by adamryczkowski

I don’t think that the answer will be a Dirichlet distribution, because the three probabilities don’t sum up to one. I suspect there isn’t an analytic solution for this problem, but I’m not certain! You can get an analytic Dirichlet posterior on the four transformed probabilities, and then you could sample from that and solve for p_A, p_B and p_T to get posterior samples on those.

The set-up of the problem makes me wonder why you can’t directly count event T but not A nor B. Is there an experimental reason?

Yes, there is a reason why I can’t count events of type T.

I am modelling a chemical reaction between three molecules: two observed marker molecules A and B and an unobserved molecule T which binds with both the markers. The task is to estimate the probability that there is at least one molecule T in the reaction chamber, but my instruments allow me to only observe if there is at least one molecule A or B (independently). Of course, we are talking about very diluted samples, where the expected number of particles of each type is below 1. I can map the concentration of the particles into the probability I estimate. Obviously, I have numerous reaction chambers that share the same concentration of each molecule.

I assumed that the distribution of s is Dirichlet and assumed that what I observe is a maximum likelihood point estimate. This allowed me to derive the formula for the relationship between pA, pB, pT and the (now known and single) value of the latent Dirichlet parameter s. Then I naïvely assumed the distribution of every element in s simplex is normal and uncorrelated (which is a blasphemy, I know), and used a standard error propagation to estimate the SE of pT.

This back-of-the-envelope technique for the ranges of values that are of practical interest give passable results within 10% of error, which may good enough to be of some value.

Have a look at the vignette(s) to the R mlogit package. Especially the train/car example.