Defining priors for dependent random variables in STAN

Background: I have a simulation model which has unobserved parameters. I created a metamodel using artificial neural networks (ANN) because the simulation model takes a lot of time. I am trying to estimate the unobserved parameters using Bayesian calibration, where priors are based on current knowledge, and the likelihood of observing data is being estimated from the metamodel.

Query: I have two random variables X and Y for which I am trying to get the posterior distribution using STAN. The prior distribution of X is uniform, U(0,2). The prior for Y is also uniform, but it will always exceed X i.e., Y ~ U(X,2). Since Y depends on X, how can I define the prior distribution for Y in STAN such that the constraint Y>X holds? I am new to STAN, so I would appreciate any suggestions or guidance on how to proceed. Thank you so much!


// Xq is the vector of parameters for calibration
parameters {
  matrix<lower=-1, upper=1>[num_targets,num_inputs] Xq;
} 
1 Like

Hi Praveen, welcome to Discourse! This is a good question!

I just want to clarify: you need a joint prior on X and Y such that

  • the margin for X is uniform between zero and two.
  • additionally, for any value of X the pushforward for Y comes from uniform(X, 2)
  • and as a result, you’re expecting the histogram of the margin for Y (without conditioning on X) that looks like a concave-up curve increasing beginning at (0,0).

Here are two Stan programs that achieve this:

Program 1

parameters {
  real<lower = 0, upper = 2> X;
  real<lower = 0, upper = 1> Y_1;
}
transformed parameters {
  real Y = Y_1*(2 - X) + X;
}

Program 2:

parameters {
  real<lower = 0, upper = 2> X;
  real<lower = X, upper = 2> Y;
}
model {
  target += -log(2 - X);
}

The intuition here is that rather than a uniform probability density over the triangle that respects your parameter constraints, we need a density such that slices along values of constant X all have the same total probability as one another. These slices differ in length from the slice at X = 0 by a factor of \frac{2 - X}{2}, and so these regions of parameter space need to contain excess probability density according to the reciprocal of this fraction. Dropping the constant 2 in the denominator and taking the logarithm we get the second model above.

2 Likes