I haven’t been able to find any help files/manuals on this topic, but wanted to see if it is possible to specify a prior in STAN (or brms) such that part of the prior is a conditional distribution? For example, in the case of using a normal inverse gamma in regression, you may want to decompose the prior for beta and sigma^2 as
p(beta, sigma^2) = p(beta|sigma^2)p(sigma^2)
Is there a way to specify the prior to capture that conditional distribution?
To make the example somewhat more concrete, suppose I have data as follows:
n <- 10
beta0 <- 5
beta1 <- 3.7
x <- 1:n
y <- beta0 + beta1 * x + rnorm(n)
dat <- data.frame(y,x)
X <- cbind(1, x)
betahat <- solve(t(X) %*% X) %*% t(X) %*% y
So my statistical model is Y = betaX + epsilon where beta = (beta0, beta1) and epsilon ~ N(0, sigma^2) (so the classical regression setup). I need to place priors on beta and sigma^2, and want my prior to be of the form
p(beta, sigma^2) = p(beta|sigma^2) p(sigma) where
beta | sigma^2 ~ N( betahat, sigma^2 * (X’X)^{-1})
sigma^2 ~ Inverse Gamma(5, 10)
Sure! For example:
parameters{
real x;
real y;
}
model{
x ~ std_normal();
y ~ normal(x, 1);
}
The above example makes the conditional structure explicit, but it would be just as valid to figure out the joint prior and to write it down in one statement (in this case it’d be multivariate normal).
In your example, beware the need for a Jacobian adjustment if you declare parameter sigma
and put a prior on sigma^2
.
@jsocolar Thanks for the prompt reply. I updated my question with some actual data to be more concrete. I wonder if you could put your solution in the context of my multivariate normal and inverse gamma priors?
I’m having trouble understanding what parts of the data simulation are supposed to also be encoded in the model, and what parts are just a way of simulating some data. In the model, is betahat
a transformed parameter or is it passed as data?
If betahat
is a transformed parameter, and the transformation mirrors the way that it’s computed in the simulate data, then you have that betahat
depends on y
, and y
depends on beta
, but that the prior for beta
depends on betahat
. Seems like some heavy math lifting would be required to turn this into a generative prior that’s easy to reason about.
If betahat
is data in your model, then:
model{
sigma2 ~ inverse_gamma(5, 10);
beta ~ normal(betahat, sigma2*inverse(X' * X));
}
Again, remember that a Jacobian adjustment will be required if you choose to declare parameter sigma
and derive the variance as a transformation of the parameter, rather than just directly parameterizing in terms of the variance as above.
@jsocolar betahat is being passed as data.
Here is an update with R and Stan code:
library(rstan)
n <- 100
beta0 <- 5
beta1 <- 3.7
x <- 1:n
y <- beta0 + beta1 * x + rnorm(n, 0, 2.4)
dat <- data.frame(y,x)
X <- cbind(1, x)
k <- ncol(X)
betahat <- solve(t(X) %*% X) %*% t(X) %*% y
s2 <- c(t(y - X %*% betahat) %*% (y - X %*% betahat) / (n - k))
my_data <- list(
n = n,
y = y,
X = X,
k = k,
mu0 = c(betahat),
delta0 = solve(t(X) %*% X),
a0 = (n - k) / 2,
b0 = (n - k) * s2 / 2
)
fit <- stan(
file = "linear model.stan",
data = my_data,
chains = 1,
warmup = 10000,
iter = 100000
)
Stan Code:
data {
// the input data
int<lower = 1> n; // total number of obs
vector[n] y; // response vector
int<lower = 1> k; // number of coefficients
matrix[n, k] X; // design matrix
// parameters for the priors
vector[k] mu0;
matrix[k, k] delta0;
real<lower = 0> a0;
real<lower = 0> b0;
}
parameters {
vector[k] beta;
real<lower = 0> sigma2;
}
model {
// prior
sigma2 ~ inv_gamma(a0, b0);
beta ~ multi_normal(mu0, sigma2 * delta0);
// likelihood
y ~ normal(X * beta, sigma2);
}
I think this works now, the only thing I am unsure of is if I am passing the right terms to the likelihood. sigma2 is a variance with an inverse gamma prior, but do I have to pass a standard deviation to the y ~ normal() command?
Yes, pass a standard deviation to the y ~ normal
! Sorry that wasn’t clear.
@jsocolar is it as simple as just updating the code to the following:
y ~ normal(X * beta, sqrt(sigma2));
Likewise, does multi_normal also expect a standard deviation for the covariance matrix? As in,
beta ~ multi_normal(mu0, sigma2 * delta0);
Is passing a variance multiplied by a matrix.