Dear all,
I’m running a prior predictive check with the following code lines.
library(rstan)
priorpred<-"
data {
int N;
}
parameters {
real<lower=0> mu;
real<lower=0> sigma;
}
model {
// priors
mu ~ normal(0,200);
sigma ~ normal(0,50);
}
generated quantities {
vector[N] y_sim;
// prior predictive
for(i in 1:N) {
y_sim[i] = normal_rng(mu,sigma);
}
}
"
dat<-list(N=100)
m1priorpred<-stan(model_code=priorpred,
data=dat,
chains = 4,
iter = 2000)
y_sim<-extract(m1priorpred,pars="y_sim")
pp_check(m1priorpred,y_sim[1:50,],ppc_dens_overlay)
and I’m getting an error message
Error in validate_y(y) : is.numeric(y) is not TRUE
I’m not quite sure what’s going on.
Thanks
David
This is because the pp_check()
function needs to either take a vector of observed outcomes y
, or an R object that has the pp_check()
method defined for extracting the needed quantities. This second approach is generally used for packages like rstanarm
or brms
where the variable names for each of the quantity types (observed/predicted/etc.) are fixed. This isn’t the case for generic Stan models fitted through rstan
, since the variable names are arbitrarily set by the user.
For using pp_check
with a generic Stan model, you’ll need to provide the reference y
values to pp_check()
instead of the model object (m1priorpred
)
Andrew,
I appreciate you quick reply. I tried the following
# Prior predictive checking
library(rstan)
library(loo)
library(bayesplot)
## Read in data ##
CanadaData <-read.csv("~/desktop/Canada.csv",header=TRUE) #browse to select data "Canada.csv"
Canada.read<- subset(CanadaData, select=c(ASRREA01))
data.list <- with(Canada.read, list(readscore=ASRREA01, n = nrow(Canada.read)))
## Begin Stan Code. Notice that Stan uses // for comments ##
priorpred<-"
data {
int n;
}
parameters {
real<lower=0> mu;
real<lower=0> sigma;
}
model {
// priors
mu ~ normal(0,200);
sigma ~ normal(0,5);
}
generated quantities {
vector[n] y_sim;
// prior predictive
for(i in 1:n) {
y_sim[i] = normal_rng(mu,sigma);
}
}
"
m1priorpred<-stan(model_code=priorpred,
data=data.list,
chains = 4,
iter = 2000)
y_sim<-extract(m1priorpred,pars="y_sim")
pp_check(readscore,y_sim,ppc_dens_overlay)
I’m getting the following error
Error in pp_check(readscore, y_sim, ppc_dens_overlay) :
object 'readscore' not found
It seems that I’m missing a small step, but just not sure what it is.
Thanks,
David
Error in pp_check(readscore, y_sim, ppc_dens_overlay) :
object 'readscore' not found
You haven’t defined readscore
in your R environment, so pp_check
doesn’t know what to use. I can see you’ve defined it in your data.list
, but not as an object by itself.
I’m guessing you want to call:
pp_check(data.list$readscore, y_sim, ppc_dens_overlay)
1 Like
Hi Andrew,
Thanks again for your thoughtful replies. Your suggestion seemed to help. The only problem is that the ppc_dens_overlay plots seem strange. I have not had this problem before. Specifically, I tried to approaches to plot the prior distribution
prior_rep<-extract(m1priorpred,pars="prior_rep")$prior_rep
ppc_dens_overlay(data.list$readscore,prior_rep[1:500,])
and
prior_rep <- as.matrix(m1priorpred,pars="prior_rep")
pp_check(data.list$readscore,prior_rep[1:1000,],ppc_dens_overlay)
In both cases, get a plot that looks like this one attached
Rplot.pdf (189.1 KB)
I’ve used this second approach many times with no issues for posterior predictive checks, but I can’t seem to get the prior predictive distribution to overlay the outcome distribution.
Thanks again.
Because you have a very vague prior on mu, but a much more reasonable prior on sigma:
mu ~ normal(0,200);
sigma ~ normal(0,5);
This means that mu
can take on very different values from iteration-to-iteration, but that the estimates within a given iteration will be fairly similar (as the sigma
is smaller). This is what the ppc_dens_overlay
plot is showing, that location parameter for each iteration is very different (the density curves are distributed along the x-axis), but that the scale parameter is comparatively small (the density curves are very narrow).
So this is a case where the plot is demonstrating that your priors are not a good match for the data