I have two hopefully simple questions related to implementing prior predictive checks (henceforth “Prior PC”) with rstan. The first is about implementation, the second is about simulating truncated parameters.
First, it seems that there are two ways to implement Prior PC.
- [Stan User’s Guide]((Stan User’s Guide) suggests to implement Prior PC using the data and generated quantities block, and to then sample the resulting stan model object.
- Many other examples of Prior PC, including brms’s option “sample_prior = ‘only’”, retain the data, parameter, model, and generated quantities. The model block only contains the prior sampling statements.
I expected these two implementations to provide the same results, but they do not on my machine. Below I given the R and STAN code for a regression with 10 data points.
library(rstan)
library(MASS)
library(tidyverse)
# STAN models
suggested_ppc <- '
data {
int N;
vector[N] x;
}
generated quantities {
real alpha = normal_rng(0,1);
real beta = normal_rng(0,1);
real y_sd = gamma_rng(0.01,0.01);
real yhat[N];
real y_sim[N];
for(n in 1:N)
yhat[n] = alpha + beta * x[n];
for(n in 1:N)
y_sim[n] = normal_rng(yhat[n], y_sd);
}
'
brms_ppc <- '
data {
int N;
vector[N] x;
}
parameters {
real alpha;
real beta;
real<lower=0> y_sd;
}
transformed parameters{
vector[N] yhat;
yhat = alpha + beta * x;
}
model {
alpha ~ normal(0,1);
beta ~ normal(0,1);
y_sd ~ gamma(0.01,0.01);
}
generated quantities {
real y_sim[N];
for(n in 1:N)
y_sim[n] = normal_rng(yhat[n], y_sd);
}
'
# Create stanmodel objects
suggested_ppc_mod <- stan_model( model_code = suggested_ppc )
brms_ppc_mod <- stan_model( model_code = brms_ppc )
# create fictitious predictor data
set.seed( 2013 + 702)
dat <- list( x = rnorm( 10 ),
N = 10 )
# fit stan models
suggested_ppc_fit <- sampling( object = suggested_ppc_mod,
data = dat,
iter = 4000,
warmup = 1000,
thin = 2,
chains = 3,
algorithm = 'Fixed_param'
)
brms_ppc_fit <- sampling( object = brms_ppc_mod,
data = dat,
iter = 4000,
warmup = 1000,
thin = 2,
chains = 3
)
# The variance of output suggested_ppc always > brms_ppc
suggested_ppc_fit %>%
rstan::extract() %>%
.$y_sim %>%
as.numeric %>%
sd(na.rm=T) # Did not understand why there are NAs!
brms_ppc_fit %>%
rstan::extract() %>%
.$y_sim %>%
as.numeric %>%
sd()
In this example, the method #1 suggested in Stand User’s Guide always provides a larger variance in the response variable than the “brms” method.
Is this supposed to happen, and which method is to be preferred?
Second, I need to perform a Prior PC on a model that has two parameters that are truncated (e.g. lower=1, upper=6.3>).
Am I correct that defining a parameter with bounds such as <lower=1, upper=6.3>, means that the parameter is truncated effectively?
And is it correct that there are no truncated random number generators in STAN?
If the answer to both of these questions is yes, then am I better off doing a prior PC directly in R, Python or some other environment where I can generate truncated distributions?
Thank you in advance, and apologies if these questions are not clear enough!