Specifying a conditional prior

I haven’t been able to find any help files/manuals on this topic, but wanted to see if it is possible to specify a prior in STAN (or brms) such that part of the prior is a conditional distribution? For example, in the case of using a normal inverse gamma in regression, you may want to decompose the prior for beta and sigma^2 as

p(beta, sigma^2) = p(beta|sigma^2)p(sigma^2)

Is there a way to specify the prior to capture that conditional distribution?

To make the example somewhat more concrete, suppose I have data as follows:

n <- 10
beta0 <- 5
beta1 <- 3.7
x <- 1:n
y <- beta0 + beta1 * x + rnorm(n)
dat <- data.frame(y,x)

X <- cbind(1, x)
betahat <- solve(t(X) %*% X) %*% t(X) %*% y

So my statistical model is Y = betaX + epsilon where beta = (beta0, beta1) and epsilon ~ N(0, sigma^2) (so the classical regression setup). I need to place priors on beta and sigma^2, and want my prior to be of the form

p(beta, sigma^2) = p(beta|sigma^2) p(sigma) where

beta | sigma^2 ~ N( betahat, sigma^2 * (X’X)^{-1})
sigma^2 ~ Inverse Gamma(5, 10)

Sure! For example:

 real x;
 real y;
 x ~ std_normal();
 y ~ normal(x, 1);

The above example makes the conditional structure explicit, but it would be just as valid to figure out the joint prior and to write it down in one statement (in this case it’d be multivariate normal).

In your example, beware the need for a Jacobian adjustment if you declare parameter sigma and put a prior on sigma^2.

@jsocolar Thanks for the prompt reply. I updated my question with some actual data to be more concrete. I wonder if you could put your solution in the context of my multivariate normal and inverse gamma priors?

I’m having trouble understanding what parts of the data simulation are supposed to also be encoded in the model, and what parts are just a way of simulating some data. In the model, is betahat a transformed parameter or is it passed as data?

If betahat is a transformed parameter, and the transformation mirrors the way that it’s computed in the simulate data, then you have that betahat depends on y, and y depends on beta, but that the prior for beta depends on betahat. Seems like some heavy math lifting would be required to turn this into a generative prior that’s easy to reason about.

If betahat is data in your model, then:

 sigma2 ~ inverse_gamma(5, 10);
 beta ~ normal(betahat, sigma2*inverse(X' * X));

Again, remember that a Jacobian adjustment will be required if you choose to declare parameter sigma and derive the variance as a transformation of the parameter, rather than just directly parameterizing in terms of the variance as above.

@jsocolar betahat is being passed as data.

Here is an update with R and Stan code:


n <- 100
beta0 <- 5
beta1 <- 3.7
x <- 1:n
y <- beta0 + beta1 * x + rnorm(n, 0, 2.4)
dat <- data.frame(y,x)

X <- cbind(1, x)
k <- ncol(X)
betahat <- solve(t(X) %*% X) %*% t(X) %*% y
s2 <- c(t(y - X %*% betahat) %*% (y - X %*% betahat) / (n - k))

my_data <- list(
  n = n,
  y = y,
  X = X,
  k = k,
  mu0 = c(betahat),
  delta0 = solve(t(X) %*% X),
  a0 = (n - k) / 2,
  b0 = (n - k) * s2 / 2

fit <- stan(
  file = "linear model.stan",
  data = my_data,
  chains = 1,
  warmup = 10000,
  iter = 100000

Stan Code:

data {
  // the input data
  int<lower = 1> n;               // total number of obs
  vector[n] y;                    // response vector
  int<lower = 1> k;               // number of coefficients
  matrix[n, k] X;                 // design matrix

  // parameters for the priors
  vector[k] mu0;
  matrix[k, k] delta0;
  real<lower = 0> a0;
  real<lower = 0> b0;

parameters {
  vector[k] beta;
  real<lower = 0> sigma2;

model {
 // prior
 sigma2 ~ inv_gamma(a0, b0);
 beta ~ multi_normal(mu0, sigma2 * delta0);

 // likelihood
 y ~ normal(X * beta, sigma2);

I think this works now, the only thing I am unsure of is if I am passing the right terms to the likelihood. sigma2 is a variance with an inverse gamma prior, but do I have to pass a standard deviation to the y ~ normal() command?

Yes, pass a standard deviation to the y ~ normal! Sorry that wasn’t clear.

@jsocolar is it as simple as just updating the code to the following:

y ~ normal(X * beta, sqrt(sigma2));

Likewise, does multi_normal also expect a standard deviation for the covariance matrix? As in,

beta ~ multi_normal(mu0, sigma2 * delta0);

Is passing a variance multiplied by a matrix.