Implementing prior predictive checks in stan

drstevok · August 5, 2021, 10:08am

I am trying to perform my prior predictive checks using Stan. The advice in the user guide suggests the following ‘complete’ program using the random number generators.

The example provided is as follows:

data {
  int<lower = 0> N;
  vector[N] x;
}
generated quantities {
  real alpha = normal_rng(0, 1);
  real beta = normal_rng(0, 1);
  real y_sim[N] = poisson_log_rng(alpha + beta * x);
}

When I try anything similar, I get the following error

/opt/homebrew/Caskroom/miniforge/base/envs/stan-base/lib/python3.8/site-packages/cmdstanpy/utils.py in scan_metric(fd, config_dict, lineno)
    714     lineno += 1
    715     if not line == '# Adaptation terminated':
--> 716         raise ValueError(
    717             'line {}: expecting metric, found:\n\t "{}"'.format(lineno, line)
    718         )

ValueError: line 1045: expecting metric, found:
	 "#"

My attempt at a simplified implementation that generated the error above is below. This is just a linear regression with a intercept alpha and slope beta.

// saved as STAN_GC
data {
  
  int<lower = 0> N;
  vector[N] x;
  vector[N] y;
  
  // pass in the prior specification
  real alpha_mean;
  real alpha_sd;
  real beta_mean;
  real beta_sd;
  real sigma_mean;

}

parameters {
    // Comment out as defined in the generated quantities block
    //real alpha;
    //real beta;
    //real sigma;

}

model {
    
    // Commenting out the log likelihood line so the model runs _without_ the data
    // y ~ normal(mu, sigma);
}

generated quantities {
  real alpha = normal_rng(alpha_mean, alpha_sd);
  real beta = normal_rng(beta_mean, beta_sd);
  real sigma = exponential_rng(sigma_mean);
  
  real y_sim[N] = normal_rng(alpha + beta * x, sigma);
}

The stan code compiles without a problem

ppc_model = CmdStanModel(stan_file=STAN_GC)
ppc_model.compile(force=True)

but generates the ValueError that I pasted above when I try to sample .

ppc_fit = ppc_model.sample(data=ds)

For reference, the data that I passed to the model is as follows.

df = pd.DataFrame(
    {'x': np.linspace(-1,+2,50),
     'e': np.random.normal(size=50)
    }
)

df['y_no_error'] = 0.5 + 2 * df['x'] 
df['y'] = df['y_no_error'] + df.e

df.head()


x	e	y_no_error	y
0	-1.000000	-3.320883	-1.500000	-4.820883
1	-0.938776	-0.048729	-1.377551	-1.426280
2	-0.877551	0.825229	-1.255102	-0.429873
3	-0.816327	-1.452188	-1.132653	-2.584841
4	-0.755102	-0.993722	-1.010204	-2.003926

which is passed in as a dictionary named ds…

ds = {
    'N': df.shape[0],
    'x': list(df.x),
    'y': list(df.y),
    'alpha_mean': 0.5,
    'alpha_sd': 1,
    'beta_mean': 0,
    'beta_sd': 2,
    'sigma_mean': 1
}

I have also tried an alternative approach wherein I used the transformed parameters block as per the following program, and this seemed to work without a problem.

data {
  int<lower = 0> N;
  vector[N] x;
  vector[N] y;
  // pass in the prior specification
  real alpha_mean;
  real alpha_sd;
  real beta_mean;
  real beta_sd;
  real sigma_mean;
}

parameters {
    real alpha;
    real beta;
    real sigma;

}

transformed parameters {
    vector[N] mu = alpha + beta * x;
}

model {
    alpha ~ normal(alpha_mean, alpha_sd);
    beta ~ normal(beta_mean,beta_sd);
    sigma ~ exponential(sigma_mean);
    
    // Commenting out the log likelihood line so the model runs _without_ the data
    // y ~ normal(mu, sigma);
}

I’d really appreciate any advice on how to go about using stan for prior predictive checks. Many thanks.

kholsinger · August 5, 2021, 6:37pm

This appears to be the problem.

I used Rstan rather than RPython, but when I changed that to

  int y_sim[N] = poisson_log_rng(alpha + beta * x);

everything worked fine.

Since I don’t use Python, the error message you reported didn’t help me. The error message in RStan was a little more helpful.

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Variable definition base type mismatch, variable declared as base type real[ ] variable definition has base type int[ ] error in 'model134125bd4f9ad_test' at line 8, column 51
  -------------------------------------------------
     6:   real alpha = normal_rng(0, 1);
     7:   real beta = normal_rng(0, 1);
     8:   real y_sim[N] = poisson_log_rng(alpha + beta * x);
                                                          ^
     9: }
  -------------------------------------------------

drstevok · August 5, 2021, 7:25pm

Thanks for this suggestion.
I copied the code verbatim from the user guide but I see why the change to int make sense.

My problem remains unsolved, because I get the same error in the Minimal Working Example (MWE) provided where I use a linear model and that final step uses normal_rng where real would be the appropriate type.

jsocolar · August 5, 2021, 8:11pm

drstevok:

generated quantities {
  real alpha = normal_rng(alpha_mean, alpha_sd);
  real beta = normal_rng(beta_mean, beta_sd);
  real sigma = exponential_rng(sigma_mean);
  
  real y_sim[N] = normal_rng(alpha + beta * x, sigma);
}

If I’m not mistaken, normal_rng(alpha + beta * x, sigma) is generating a single real, but you are assigning it to y_sim which is declared as an array of N reals.

drstevok · August 6, 2021, 2:57pm

Thanks for the suggestion about normal_rng producing just a single real.
I rewrote the generated quantities block as below:


generated quantities {
  real alpha = normal_rng(alpha_mean, alpha_sd);
  real beta = normal_rng(beta_mean, beta_sd);
  real sigma = exponential_rng(sigma_mean);

  vector[N] y_sim;

  for (i in 1:N) {
    y_sim[i] = normal_rng(alpha + beta * x[i], sigma);
  }
}

But this still gives me the same error.

/opt/homebrew/Caskroom/miniforge/base/envs/stan-base/lib/python3.8/site-packages/cmdstanpy/utils.py in scan_metric(fd, config_dict, lineno)
    714     lineno += 1
    715     if not line == '# Adaptation terminated':
--> 716         raise ValueError(
    717             'line {}: expecting metric, found:\n\t "{}"'.format(lineno, line)
    718         )

ValueError: line 1045: expecting metric, found:
	 "#"

jsocolar · August 6, 2021, 2:59pm

In any case, my suggestion was wrong! I missed that x was a vector in your original code!

mitzimorris · August 6, 2021, 3:59pm

to do this in CmdStanPy, see this: MCMC Sampling — CmdStanPy 0.9.76 documentation

the error you’re hitting in CmdStanPy is due to the fact that currently, CmdStanPy requires that models with no parameters have sample argument fixed_param=True, however in latest release of CmdStan, the sampler runs fixed_param automatically if it detects that a model doesn’t have any parameters. (Print errors to stderr instead of stdout; don't nag for `fixed_param` flag, just do it by mitzimorris · Pull Request #992 · stan-dev/cmdstan · GitHub)

we need to update CmdStanPy to do the same. `sample` method - proper handling of models with no parameters · Issue #425 · stan-dev/cmdstanpy · GitHub

drstevok · August 12, 2021, 8:37pm

Just following up here to say the amazing CmdStanPy team have helped out, and there’s an update that handles models with no parameters and cases like this. Thanks!
See `sample` method - proper handling of models with no parameters · Issue #425 · stan-dev/cmdstanpy · GitHub and Bugfix/425 fixed params by mitzimorris · Pull Request #427 · stan-dev/cmdstanpy · GitHub

Topic		Replies	Views
Prior Predictive Check From Tutorial Modeling fitting-issues , prior-predictive	1	494	June 6, 2022
Prior Predictive Checks with RStan Modeling rstan , prior-predictive	3	771	August 12, 2020
Prior predictive check Modeling specification	3	518	August 26, 2021
Prior predictive checks: general implementation and truncated parameters Modeling rstan , specification	4	831	April 30, 2021
Binomial_rng fails due to probability parameter being nan Modeling	2	776	November 23, 2021

Implementing prior predictive checks in stan

Related topics