I am trying to perform my prior predictive checks using Stan. The advice in the user guide suggests the following ‘complete’ program using the random number generators.
The example provided is as follows:
data {
int<lower = 0> N;
vector[N] x;
}
generated quantities {
real alpha = normal_rng(0, 1);
real beta = normal_rng(0, 1);
real y_sim[N] = poisson_log_rng(alpha + beta * x);
}
When I try anything similar, I get the following error
/opt/homebrew/Caskroom/miniforge/base/envs/stan-base/lib/python3.8/site-packages/cmdstanpy/utils.py in scan_metric(fd, config_dict, lineno)
714 lineno += 1
715 if not line == '# Adaptation terminated':
--> 716 raise ValueError(
717 'line {}: expecting metric, found:\n\t "{}"'.format(lineno, line)
718 )
ValueError: line 1045: expecting metric, found:
"#"
My attempt at a simplified implementation that generated the error above is below. This is just a linear regression with a intercept alpha
and slope beta
.
// saved as STAN_GC
data {
int<lower = 0> N;
vector[N] x;
vector[N] y;
// pass in the prior specification
real alpha_mean;
real alpha_sd;
real beta_mean;
real beta_sd;
real sigma_mean;
}
parameters {
// Comment out as defined in the generated quantities block
//real alpha;
//real beta;
//real sigma;
}
model {
// Commenting out the log likelihood line so the model runs _without_ the data
// y ~ normal(mu, sigma);
}
generated quantities {
real alpha = normal_rng(alpha_mean, alpha_sd);
real beta = normal_rng(beta_mean, beta_sd);
real sigma = exponential_rng(sigma_mean);
real y_sim[N] = normal_rng(alpha + beta * x, sigma);
}
The stan code compiles without a problem
ppc_model = CmdStanModel(stan_file=STAN_GC)
ppc_model.compile(force=True)
but generates the ValueError
that I pasted above when I try to sample .
ppc_fit = ppc_model.sample(data=ds)
For reference, the data that I passed to the model is as follows.
df = pd.DataFrame(
{'x': np.linspace(-1,+2,50),
'e': np.random.normal(size=50)
}
)
df['y_no_error'] = 0.5 + 2 * df['x']
df['y'] = df['y_no_error'] + df.e
df.head()
x e y_no_error y
0 -1.000000 -3.320883 -1.500000 -4.820883
1 -0.938776 -0.048729 -1.377551 -1.426280
2 -0.877551 0.825229 -1.255102 -0.429873
3 -0.816327 -1.452188 -1.132653 -2.584841
4 -0.755102 -0.993722 -1.010204 -2.003926
which is passed in as a dictionary named ds
…
ds = {
'N': df.shape[0],
'x': list(df.x),
'y': list(df.y),
'alpha_mean': 0.5,
'alpha_sd': 1,
'beta_mean': 0,
'beta_sd': 2,
'sigma_mean': 1
}
I have also tried an alternative approach wherein I used the transformed parameters block as per the following program, and this seemed to work without a problem.
data {
int<lower = 0> N;
vector[N] x;
vector[N] y;
// pass in the prior specification
real alpha_mean;
real alpha_sd;
real beta_mean;
real beta_sd;
real sigma_mean;
}
parameters {
real alpha;
real beta;
real sigma;
}
transformed parameters {
vector[N] mu = alpha + beta * x;
}
model {
alpha ~ normal(alpha_mean, alpha_sd);
beta ~ normal(beta_mean,beta_sd);
sigma ~ exponential(sigma_mean);
// Commenting out the log likelihood line so the model runs _without_ the data
// y ~ normal(mu, sigma);
}
I’d really appreciate any advice on how to go about using stan for prior predictive checks. Many thanks.