Now, I tried to calculate an integral using posterior predictive distribution as its measure with Stan.
I have a one solution but I am not sure whether it is correctly made or not. If you do not mind, please let me know that the strategy of the following code is correct or not.
Let y be data to be fitted a model and
\theta be a model parameter,
f(y|\theta) be a model
\pi(\theta|y) be a posterior
Let T(y,\theta) be a function.
Let y_{\text{obs}} be an observed dataset.
To calculate the integral \int \int T(y,\theta) f(y|\theta)\pi(\theta|y_{\text{obs}}) dy d\theta, I use the Monte Carlo approximation twice as follows.
\int \int T(y,\theta) f(y|\theta)\pi(\theta|y_{\text{obs}}) dy d\theta
\approx \frac{1}{I} \sum_{i=1,...,I} \int T(y,\theta_i) f(y|\theta_i)dy
\approx \frac{1}{I} \sum_{i=1,...,I}\frac{1}{J} \sum_{j=1,...,J} T(y_{ij},\theta_i)
where \theta_i denotes a posterior sample and y_{ij} is a sample drawn from f(*|\theta_i), i.e.,
\theta_i \sim \pi( *|\theta) and y_{ij} \sim f(*|\theta_i).
Example
In the following code, the model is
y \sim \text{Noraml}(\mu,1)
\mu \sim \text{Normal(0,10)}
and \mu is a model parameter (in the above it is denoted by \theta).
T(y,\mu) = y + \mu.
In the following code,
the posterior mean of the parameter T
defined in generated quantities block is seemed to be the desired integral. But, I guess it merely calculates the
\sum _iT(y_{i},\theta_i)
which is not same as the desired summation, namely
\sum _j\sum _iT(y_{ij},\theta_i).
Because in the later the subscripts i and j can move in distinct regions, but the former is not. In other words, in the former summation, sample y_i is drawn only one time from the model f(*|\theta_i) but I guess it should be drawn more times.
Should I make a more complicated code to calculate the integral or the following code is sufficient to calculate the integral?
stanmodelcode <- "
data {
int<lower=0> N;
real y[N];
}
parameters {
real mu;
}
model {
target += normal_lpdf(mu | 0, 10);
target += normal_lpdf(y | mu, 1);
}
generated quantities{
real yyy = normal_rng(mu,1);
real T = yyy + mu;
}
"
y <- rnorm(20)
dat <- list(N = 20, y = y);
fit <- stan(model_code = stanmodelcode, model_name = "example",
data = dat, iter = 2012, chains = 3, verbose = FALSE,
sample_file = file.path(tempdir(), 'norm.csv'))
print(fit)