Sampling from prior without running a separate model

llewmills · August 29, 2019, 5:58am

I want to graph the histograms of parameter estimates from a stan model against the priors for those parameters. I have tried doing this by running a model in stan, graphing it with ggplot2, then overlaying an approximation of the prior distribution using R’s random generator function (e.g. rnorm(), rbinom()) but I have run into many scaling issues that make the graphs impossible to get looking right.

I was thinking a better way to do it would be simply to sample directly from the prior distribution and then graph those samples against the parameter estimates, but running a whole separate model just to sample from the priors seems very time-consuming. I was wondering if there was a way to do this within, or rather parallel-to, an existing model.

Here is a sample script with some toy data.

# simulate linear model
a <- 3 # intercept
b <- 2 # slope

# data
x <- rnorm(28, 0, 1)
eps <- rnorm(28, 0, 2)
y <- a + b*x + eps

# put data into list
data_reg <- list(N = 28, x = x, y = y)

# create the model string

ms <- "
    data {
    int<lower=0> N;
    vector[N] x;
    vector[N] y;
    }
    parameters {
    real alpha;
    real beta;
    real<lower=0> sigma;
    }
    model {
    vector[N] mu;
    sigma ~ cauchy(0, 2);
    beta ~ normal(0,10);
    alpha ~ normal(0,100);
    for ( i in 1:N ) {
    mu[i] = alpha + beta * x[i];
    }
    y ~ normal(mu, sigma);
    }
"

# now fit the model in stan
fit1 <- stan(model_code = ms,     # model string
             data = data_reg,        # named list of data
             chains = 1,             # number of Markov chains
             warmup = 1e3,          # number of warmup iterations per chain
             iter = 2e3)         # show progress every 'refresh' iterations

# extract the sample estimates
post <- extract(fit1, pars = c("alpha", "beta", "sigma"))

# now for the density plots. Write a plotting function
densFunct <- function (parName) {
  g <- ggplot(postDF, aes_string(x = parName)) + 
              geom_histogram(aes(y=..density..), fill = "white", colour = "black", bins = 50) +
              geom_density(fill = "skyblue", alpha = 0.3)
  return(g)
}

# plot 
gridExtra::grid.arrange(grobs = lapply(names(postDF), function (i) densFunct(i)), ncol = 1)

Rplot

Now I understand that I can sample from the prior by simply omitting the likelihood from the model string, like so

ms <- "
  data {
    int<lower=0> N;
    vector[N] x;
    vector[N] y;
  }
  parameters {
    real alpha;
    real beta;
    real<lower=0> sigma;
  }
  model {
    sigma ~ cauchy(0, 2);
    beta ~ normal(0,10);
    alpha ~ normal(0,100);
  }
"

But is there any way to get the samples from the prior within the first model? Maybe via the generated quantities block?

llewmills · August 30, 2019, 2:05am

Solved this, see:

betanalpha · August 30, 2019, 3:43am

Because Stan samples from the posterior directly it has no information about the prior density or how it might generate samples from the prior distribution. That means that there is nothing that can be extracted from the posterior fit to inform a pure prior fit, and hence no gain from trying to sample both at the same time.

Recognizing that you have to fit two models anyways you have multiple options. You can build a single model with an if statement that turns off likelihood terms and get posterior and prior samples with two fits of that one model. Or you can build two models with a little bit of redundancy, one for the prior and one for the full joint model over data and parameters. Your solution is actually equivalent to this latter approach only with two models crammed back into the same (but now more expensive) MCMC fit.

I prefer specifying two models because in most cases the prior can be sampled from directly using PRNGs. Instead of having to run an MCMC fit the prior samples can be generated in the transformed data block or the generated quantities block and run in a fraction of the time that the posterior fit will take. This exact sampling program is also a critical aspect of simulating from the prior predictive distribution which is a vital part of a robust modeling workflow. See for example the sample_joint_ensemble models in https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html.

Topic		Replies	Views
Built-in way to plot prior and posterior distributions in RStan? RStan rstan	5	2527	August 15, 2019
PyStan: how to sample priors? PyStan pystan	2	896	April 28, 2021
How to use prior from a sample? Modeling pystan , stan , priors	5	596	October 25, 2023
Posterior_vs_prior function for stanfit object RStan bayesplot	3	1438	February 1, 2019
Stan samples from prior predictive distribution the same number of samples as posterior draws. How restrict it to only one? Modeling	3	563	November 21, 2022

Sampling from prior without running a separate model

Related topics