Fitting multiple datasets simultaneously

I am fitting two different datasets (y_1 and y_2) that denote two different quantities (counts of cells and proportions of marker+ cells viz.), which are iid observations.
Both these quantities can be derived from the same model (M) with shared parameters, such that

y_1 \sim N(\mu_1, \sigma_1) \quad \quad \quad \quad \mu_1 = M(\text{time}, \alpha, \beta) \\ y_2 \sim N(\mu_2, \sigma_2)\quad \quad \quad \quad \mu_2 = M(\text{time}, \beta, \delta)\\

How to specify this optimally in stan? How does stan handle this? Does it form a joint distribution of y_1 and y_2 while fitting the model M(\theta) simultaneously on them? Here, \theta is an array of \alpha, \beta, \text{ and } \delta.

The general practice I came across to specify this model in stan is,

  alpha ~ normal(0.3, 0.2);
  beta ~ normal(0.01, 0.2);
  delta ~ normal(0.03, 0.2);

  sigma1 ~ cauchy(0, 1);
  sigma2 ~ cauchy(0, 1);

  y1 ~ normal(mu1, sigma1);
  y2 ~ normal(mu2, sigma2);

where \mu_1 and \mu_2 are defined in the transformed parameters block.

Is this right?

Yep! You figured it out.

I am totally still taking credit for having the “solution”, though.

1 Like

But how does Stan handle it? Does it form a multivariate joint distribution?

You’re just adding another multiplicative term to the joint posterior, same as if you added a prior on a parameter. It all gets multiplied together. The distributions can be completely independent, as with priors, and they won’t affect each other. If there is some connection between them, as in your example, then Stan will find the most likely values for parameters for both models in the joint posterior.

A multivariate distribution is usually something like multivariate normal where there is estimated correlation across a set of random variables. In your case, that won’t happen unless you change y1 and y2 to a vector and estimate it with MVN. The estimates can be correlated in the joint posterior, though, regardless of whether you estimate it.

1 Like

Just to clarify for my own sanity.

It is

P(\theta | y1, y2) = P(y1 | \theta) P(y2 | \theta) P(\theta)

and not

P(\theta | y1, y2) = P(y1,y2 | \theta) P(\theta)


If those equalities were \propto s (proportional to-s), then they would be results of Bayes rule, but with equality neither is.

If you have conditional independence of y_1 and y_2 given \theta, then p(y_1 | \theta) p(y_2 | \theta) p(\theta) = p(y_1, y_2 | \theta) p(\theta).

And p(\theta | y_1, y_2) = \frac{p(y_1, y_2 | \theta) p(\theta)}{p(y_1, y_2)} by Bayes rule.

Thanks Ben.

I didn’t completely understand your first sentence. What’s ‘s’ here?

So Stan forms a joint likelihood i.e. P(y_1, y_2) | \theta) to estimate the posterior.
That’s good to know.

I was trying to pluralize “proportional to”, so it’s multiple proportional-to-things, just awkward notation.

Stan samples one big joint density. If you have two separate likelihoods but they share hyperparameters, the data in each likelihood informs the hyperparameters and they will have an effect on each other (that’s just a result of Bayes rule).

There have been attempts before to try to make it so that this influence only goes one way (like one problem informs the other, but not the reverse), but that’s not what Bayes rule says, it’s kinda weird, and I’m not sure it’s possible to do it in Stan anyhow: (and it’s not quite the question, but it’s something you might have ended up wondering about so I wanted to give you a reference on it).

1 Like

Gotcha! Also, I was reading on the phone which didn’t help.

Thanks for the link and the explanation. I am trying to understand hierarchical sampling and this indeed has been very helpful.

Best l, Sanket.

1 Like