Quick question about graphical prior predictive check

Dear all,

I’m running a prior predictive check with the following code lines.

library(rstan)
priorpred<-"

data {
int N;
}
parameters {
real<lower=0> mu;
real<lower=0> sigma;
}

model {
// priors
mu ~ normal(0,200);
sigma ~ normal(0,50);
}

generated quantities {
vector[N] y_sim;
// prior predictive
for(i in 1:N) {
y_sim[i] = normal_rng(mu,sigma);
}
}

"
dat<-list(N=100)

m1priorpred<-stan(model_code=priorpred,
                  data=dat,
                  chains = 4,
                  iter = 2000)

y_sim<-extract(m1priorpred,pars="y_sim")

pp_check(m1priorpred,y_sim[1:50,],ppc_dens_overlay)

and I’m getting an error message

Error in validate_y(y) : is.numeric(y) is not TRUE

I’m not quite sure what’s going on.

Thanks

David

This is because the pp_check() function needs to either take a vector of observed outcomes y, or an R object that has the pp_check() method defined for extracting the needed quantities. This second approach is generally used for packages like rstanarmor brms where the variable names for each of the quantity types (observed/predicted/etc.) are fixed. This isn’t the case for generic Stan models fitted through rstan, since the variable names are arbitrarily set by the user.

For using pp_check with a generic Stan model, you’ll need to provide the reference y values to pp_check() instead of the model object (m1priorpred)

Andrew,

I appreciate you quick reply. I tried the following


# Prior predictive checking

library(rstan)
library(loo)
library(bayesplot)


## Read in data ##

CanadaData <-read.csv("~/desktop/Canada.csv",header=TRUE) #browse to select data "Canada.csv"
Canada.read<- subset(CanadaData, select=c(ASRREA01))
data.list <- with(Canada.read, list(readscore=ASRREA01, n = nrow(Canada.read)))

## Begin Stan Code. Notice that Stan uses // for comments ##

priorpred<-"

data {
int n;
}
parameters {
real<lower=0> mu;
real<lower=0> sigma;
}

model {
// priors
mu ~ normal(0,200);
sigma ~ normal(0,5);
}

generated quantities {
vector[n] y_sim;

// prior predictive

for(i in 1:n) {
y_sim[i] = normal_rng(mu,sigma);
  }
}

"
m1priorpred<-stan(model_code=priorpred,
                  data=data.list,
                  chains = 4,
                  iter = 2000)

y_sim<-extract(m1priorpred,pars="y_sim")

pp_check(readscore,y_sim,ppc_dens_overlay)

I’m getting the following error

Error in pp_check(readscore, y_sim, ppc_dens_overlay) : 
  object 'readscore' not found

It seems that I’m missing a small step, but just not sure what it is.

Thanks,

David

Error in pp_check(readscore, y_sim, ppc_dens_overlay) : 
  object 'readscore' not found

You haven’t defined readscore in your R environment, so pp_check doesn’t know what to use. I can see you’ve defined it in your data.list, but not as an object by itself.

I’m guessing you want to call:

pp_check(data.list$readscore, y_sim, ppc_dens_overlay)
1 Like

Hi Andrew,

Thanks again for your thoughtful replies. Your suggestion seemed to help. The only problem is that the ppc_dens_overlay plots seem strange. I have not had this problem before. Specifically, I tried to approaches to plot the prior distribution

prior_rep<-extract(m1priorpred,pars="prior_rep")$prior_rep
ppc_dens_overlay(data.list$readscore,prior_rep[1:500,])

and

prior_rep <- as.matrix(m1priorpred,pars="prior_rep")
pp_check(data.list$readscore,prior_rep[1:1000,],ppc_dens_overlay)

In both cases, get a plot that looks like this one attached
Rplot.pdf (189.1 KB)

I’ve used this second approach many times with no issues for posterior predictive checks, but I can’t seem to get the prior predictive distribution to overlay the outcome distribution.

Thanks again.

Because you have a very vague prior on mu, but a much more reasonable prior on sigma:

mu ~ normal(0,200);
sigma ~ normal(0,5);

This means that mu can take on very different values from iteration-to-iteration, but that the estimates within a given iteration will be fairly similar (as the sigma is smaller). This is what the ppc_dens_overlay plot is showing, that location parameter for each iteration is very different (the density curves are distributed along the x-axis), but that the scale parameter is comparatively small (the density curves are very narrow).

So this is a case where the plot is demonstrating that your priors are not a good match for the data