Prior predictive check with default Stan priors

timchis · June 1, 2024, 12:48am

Hello,

I’m fairly new to using Stan to fit models and was looking at RStan: the R interface to Stan vignette with the 8 schools problem.

The model fit in this example is of the following format (I’m using a centered parameterization for now as I want to explore the sampling problems later on):

data {
  int<lower = 0> J;
  real y[J];
  real<lower = 0> sigma[J];
}

parameters {
  real mu; 
  real<lower = 0> tau;
  vector[J] theta;
}

model {
  theta ~ normal(mu, tau);
  y ~ normal(theta, sigma);
}

Also, the data is of the following form:

schools_data <- list(
	J = 8,
	y = c(28, 8, -3, 7, -1, 1, 18, 12),
	sigma = c(15, 10, 16, 11, 9, 11, 10, 18)
)

From the model, you can see that the hyperparameters were given a default prior, which I understand to be uniform across the domain. However, suppose I wanted to conduct a prior predictive check using this prior specification, how can I do so?

Will something like this suffice where I set “a” to be a large value (possibly even Inf if Stan allows):

data {
  int<lower = 1> J; 
  real<lower = 0> sigma[J];
}

model {
}

generated quantities {
  real mu = uniform_rng(-a, a);
  real<lower = 0> tau = abs(uniform_rng(-a, a));
  real y_rep[J];
  real theta[J];
  for (j in 1:J) {
    theta[j] = normal_rng(mu, tau);
    y_rep[j] = normal_rng(theta[j], sigma[j]);
  }
}

Also, is there a better way to model tau given the truncation?

Thanks in advance!

All the best,

Tim

Garren_Hermanus · June 1, 2024, 9:31pm

Hi, this would probably suffice. A better alternative might just be to define mu, tau, theta, sigma as parameters instead, and then include a sampling statement for theta in the model block. This should effectively draw samples from the prior distributions.

timchis · June 1, 2024, 10:12pm

Hello,

Thank you very much for replying, I was unaware you could do this, I thought we were confined to the generated quantities block! I modified my code to the following and things seem to be working:

// Index and hyperparameter values.
data {
  int<lower = 1> J;               // Number of schools.
  real<lower = 0> sigma[J];       // Variance of the jth school.
}

// Parameters and hyperparameters.
parameters {
  real mu;                    // School mean.
  real<lower = 0> tau;        // School standard deviation.
}

// The model to be estimated.
model {
}

// Generate data according to the no pooling regression model.
generated quantities {
  real y_rep[J];           // Vector of estimates.
  real theta[J];           // Vector of school effects.
  for (j in 1:J) {
    theta[j] = normal_rng(mu, tau);
    y_rep[j] = normal_rng(theta[j], sigma[j]);
  }
}

Note that I wasn’t able to put theta in the parameter block without receiving the following error: “Cannot assign to global variable ‘theta’ declared in previous blocks.”

Thanks again!

All the best,

Tim

Garren_Hermanus · June 1, 2024, 10:42pm

This had to be an error with you code somewhere. You should be able to define theta as a parameter. Also note that the definition of real array’s real<lower=0> sigma[J] has been depreciated in current versions of stan. The definition should either be array[J] real<lower=0> sigma, or defined as a vector, vector<lower=0>[J] sigma. Also in terms of optimization, the normal function has been optimized for vectors and 1D arrays. You may generate prior predictions without the loop.

data {
    int<lower=0> J;
    array[J] real<lower=0> sigma;
}

parameters {
    real mu;
    real<lower=0> tau;
    array[J] real theta;
}

model {
    theta ~ normal(mu, tau);
}

generated quantities {
    array[J] real y;
    y = normal_rng(theta, sigma);
}

timchis · June 1, 2024, 11:00pm

Hello,

I see the issue, I didn’t use the model block as I read it should be empty when simulating data like this for a prior predictive check. I’m assuming that a more accurate statement is just that the quantities you are simulating shouldn’t appear in the model block, i.e., y_rep in my case? As well, thank you for the information regarding the initialization and for loop, I am currently editing my code to incorporate these!

All the best,

Tim

Garren_Hermanus · June 1, 2024, 11:11pm

Correct yes, the likelihood (data) should not appear in the model block.

If we have some prior distribution P(\theta) generating samples from an MCMC technique should yield the same results as generating random samples using the foo_rng functions

timchis · June 1, 2024, 11:17pm

Hi,

Thank you very much for the information, below is my edited code now:

// Index values.
data {
  int<lower = 1> J;                       // Number of schools.
  vector<lower = 0>[J] sigma;     // Standard error of the jth school.
}

// Parameters and hyperparameters.
parameters {
  real mu;                                   // School mean.
  real<lower = 0> tau;                // School standard deviation.
  vector[J] theta;                        // Vector of school effects.
}

// The model to be estimated.
model {
	theta ~ normal(mu, tau);
}

// Generate data according to the partial pooling regression model.
generated quantities {
  array[J] real y_rep;                    // Vector of estimates.
  y_rep = normal_rng(theta, sigma);
}

Thanks for your assistance!

All the best,

Tim

Topic		Replies	Views
Prior Predictive Checks with RStan Modeling rstan , prior-predictive	3	776	August 12, 2020
Quick question about graphical prior predictive check Modeling	5	618	October 10, 2022
Prior predictive checks: general implementation and truncated parameters Modeling rstan , specification	4	844	April 30, 2021
What is the "default prior" in Stan ? Is it an "ideal" flat prior? Modeling	2	609	January 30, 2020
Setting up priors correctly Modeling rstan , fitting-issues , specification	3	889	May 15, 2023

Prior predictive check with default Stan priors

Related topics