Implementing prior predictive checks in stan

I am trying to perform my prior predictive checks using Stan. The advice in the user guide suggests the following ‘complete’ program using the random number generators.

The example provided is as follows:

data {
  int<lower = 0> N;
  vector[N] x;
}
generated quantities {
  real alpha = normal_rng(0, 1);
  real beta = normal_rng(0, 1);
  real y_sim[N] = poisson_log_rng(alpha + beta * x);
}

When I try anything similar, I get the following error

/opt/homebrew/Caskroom/miniforge/base/envs/stan-base/lib/python3.8/site-packages/cmdstanpy/utils.py in scan_metric(fd, config_dict, lineno)
    714     lineno += 1
    715     if not line == '# Adaptation terminated':
--> 716         raise ValueError(
    717             'line {}: expecting metric, found:\n\t "{}"'.format(lineno, line)
    718         )

ValueError: line 1045: expecting metric, found:
	 "#"

My attempt at a simplified implementation that generated the error above is below. This is just a linear regression with a intercept alpha and slope beta.

// saved as STAN_GC
data {
  
  int<lower = 0> N;
  vector[N] x;
  vector[N] y;
  
  // pass in the prior specification
  real alpha_mean;
  real alpha_sd;
  real beta_mean;
  real beta_sd;
  real sigma_mean;

}

parameters {
    // Comment out as defined in the generated quantities block
    //real alpha;
    //real beta;
    //real sigma;

}

model {
    
    // Commenting out the log likelihood line so the model runs _without_ the data
    // y ~ normal(mu, sigma);
}

generated quantities {
  real alpha = normal_rng(alpha_mean, alpha_sd);
  real beta = normal_rng(beta_mean, beta_sd);
  real sigma = exponential_rng(sigma_mean);
  
  real y_sim[N] = normal_rng(alpha + beta * x, sigma);
}

The stan code compiles without a problem

ppc_model = CmdStanModel(stan_file=STAN_GC)
ppc_model.compile(force=True)

but generates the ValueError that I pasted above when I try to sample .

ppc_fit = ppc_model.sample(data=ds)

For reference, the data that I passed to the model is as follows.

df = pd.DataFrame(
    {'x': np.linspace(-1,+2,50),
     'e': np.random.normal(size=50)
    }
)

df['y_no_error'] = 0.5 + 2 * df['x'] 
df['y'] = df['y_no_error'] + df.e

df.head()

x	e	y_no_error	y
0	-1.000000	-3.320883	-1.500000	-4.820883
1	-0.938776	-0.048729	-1.377551	-1.426280
2	-0.877551	0.825229	-1.255102	-0.429873
3	-0.816327	-1.452188	-1.132653	-2.584841
4	-0.755102	-0.993722	-1.010204	-2.003926

which is passed in as a dictionary named ds

ds = {
    'N': df.shape[0],
    'x': list(df.x),
    'y': list(df.y),
    'alpha_mean': 0.5,
    'alpha_sd': 1,
    'beta_mean': 0,
    'beta_sd': 2,
    'sigma_mean': 1
}

I have also tried an alternative approach wherein I used the transformed parameters block as per the following program, and this seemed to work without a problem.

data {
  int<lower = 0> N;
  vector[N] x;
  vector[N] y;
  // pass in the prior specification
  real alpha_mean;
  real alpha_sd;
  real beta_mean;
  real beta_sd;
  real sigma_mean;
}

parameters {
    real alpha;
    real beta;
    real sigma;

}

transformed parameters {
    vector[N] mu = alpha + beta * x;
}

model {
    alpha ~ normal(alpha_mean, alpha_sd);
    beta ~ normal(beta_mean,beta_sd);
    sigma ~ exponential(sigma_mean);
    
    // Commenting out the log likelihood line so the model runs _without_ the data
    // y ~ normal(mu, sigma);
}

I’d really appreciate any advice on how to go about using stan for prior predictive checks. Many thanks.

2 Likes

This appears to be the problem.

I used Rstan rather than RPython, but when I changed that to

  int y_sim[N] = poisson_log_rng(alpha + beta * x);

everything worked fine.

Since I don’t use Python, the error message you reported didn’t help me. The error message in RStan was a little more helpful.

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Variable definition base type mismatch, variable declared as base type real[ ] variable definition has base type int[ ] error in 'model134125bd4f9ad_test' at line 8, column 51
  -------------------------------------------------
     6:   real alpha = normal_rng(0, 1);
     7:   real beta = normal_rng(0, 1);
     8:   real y_sim[N] = poisson_log_rng(alpha + beta * x);
                                                          ^
     9: }
  -------------------------------------------------
1 Like

Thanks for this suggestion.
I copied the code verbatim from the user guide but I see why the change to int make sense.

My problem remains unsolved, because I get the same error in the Minimal Working Example (MWE) provided where I use a linear model and that final step uses normal_rng where real would be the appropriate type.

If I’m not mistaken, normal_rng(alpha + beta * x, sigma) is generating a single real, but you are assigning it to y_sim which is declared as an array of N reals.

Thanks for the suggestion about normal_rng producing just a single real.
I rewrote the generated quantities block as below:


generated quantities {
  real alpha = normal_rng(alpha_mean, alpha_sd);
  real beta = normal_rng(beta_mean, beta_sd);
  real sigma = exponential_rng(sigma_mean);

  vector[N] y_sim;

  for (i in 1:N) {
    y_sim[i] = normal_rng(alpha + beta * x[i], sigma);
  }
}

But this still gives me the same error.

/opt/homebrew/Caskroom/miniforge/base/envs/stan-base/lib/python3.8/site-packages/cmdstanpy/utils.py in scan_metric(fd, config_dict, lineno)
    714     lineno += 1
    715     if not line == '# Adaptation terminated':
--> 716         raise ValueError(
    717             'line {}: expecting metric, found:\n\t "{}"'.format(lineno, line)
    718         )

ValueError: line 1045: expecting metric, found:
	 "#"

In any case, my suggestion was wrong! I missed that x was a vector in your original code!

to do this in CmdStanPy, see this: MCMC Sampling — CmdStanPy 0.9.76 documentation

the error you’re hitting in CmdStanPy is due to the fact that currently, CmdStanPy requires that models with no parameters have sample argument fixed_param=True, however in latest release of CmdStan, the sampler runs fixed_param automatically if it detects that a model doesn’t have any parameters. (Print errors to stderr instead of stdout; don't nag for `fixed_param` flag, just do it by mitzimorris · Pull Request #992 · stan-dev/cmdstan · GitHub)

we need to update CmdStanPy to do the same. `sample` method - proper handling of models with no parameters · Issue #425 · stan-dev/cmdstanpy · GitHub

Just following up here to say the amazing CmdStanPy team have helped out, and there’s an update that handles models with no parameters and cases like this. Thanks!
See `sample` method - proper handling of models with no parameters · Issue #425 · stan-dev/cmdstanpy · GitHub and Bugfix/425 fixed params by mitzimorris · Pull Request #427 · stan-dev/cmdstanpy · GitHub