Hi,
I have searched the internet high and low for insight here and found examples. But my experience here is limited and I may be misinterpreting the examples I have found. I greatly value and respect the expertise present in this Forum. Any thoughts on this use of loo-lc and in the way log_lik is generated would be greatly appreciated. I have some QUESTIONS below.
Background
Drug product stability with potency measurements over time: 3 batches of new data. 13 batches of historical data used to specify a power prior with coefficient a0. The stability profile for each batch is modeled as a straight line over time with a batch specific intercept and slope. Batch slopes and intercepts are assumed independently normally distributed. We refer to this as a “random slope and intercept model”. It is not much different than the simpler models often used with the radon example. What is different here is the use of a power prior.
Description of the Stan Model (given below):
- Hopefully the shape of the data is clear. 96 observations (y values, batch identifier, storage time t). The first 12 observations are from the 3 “new” batches. The next 84 observations are from 13 “historic/old” batches (the source of the power prior).
- At the population level there are 4 parameters: beta0(intercept) and beta1(slope) with respective SDs sigma0 and sigma1. At the batch level there are latent intercepts (b0) and slopes (b1) for each of the 16 batches. Measurements (y) are obtained with a residual SD of sigma. All the distributions are normal.
- Some very diffuse priors are assumed for the 5 parameters. These really have little information content. The bulk of prior information is meant to come from the “historical/old” batches expressed as a power prior likelihood discounted by a0 exponent (0 to 1).
- The goal is to determine/justify a value for a0 based on minimizing loo-ic.
- The likelihoods for the “new” data and “old” data are identical except that the likelihood for the “old” data is discounted by the a0 exponent (a multiplier of the log likelihood). Note that the “old” likelihood is really the power prior. QUESTION: Is this the proper way to express the likelihood?
- In order to obtain loo-ic (using the loo package) the generated quantities block is used. Notice that log_lik only makes use of the likelihood of the “new” data values. It seems that the “old” data should not be included here since the “old” data is really providing prior information. QUESTION: Is this a correct view?
- I have no experience with estimating the log_likelihood this way. QUESTION: Is the expression in the Stan model a correct way to get at the predictive reliability of the fitted model?
- I have verified that the model fits correctly when a0=1. What I find with this data set is that loo-ic decreases smoothly as a0 goes from 0 to 1. There may be a minimum slightly below 1. QUESTION: Am I correct that the model (i.e., choice of a0) giving the minimum loo-ic is “optimal” in a predictive sense?
Again, any criticisms, corrections, thoughts would be very much appreciated!!
Thank you,
Dave
data {
int <lower=1> n; // total number of batches (new+old = 16)
int <lower=1> N; // total number of y measurements (new+old = 96)
int <lower=1> M; // number of new y measurements (=12 y observations)
int <lower=1> m; // last new data batch (3 new batches listed first)
int <lower=1,upper=n> batch[N]; // batch index (1...16)
vector[N] t; // sampling times
vector[N] y; // measurements
real<lower=0,upper=1> a0; // power prior coefficient
}
parameters {
real beta0; // population mean intercept
real beta1; // population mean slope
vector[n] b0; // batch mean intercepts
vector[n] b1; // batch mean slopes
real<lower=0> sigma0; // inter-batch intercept SD
real<lower=0> sigma1; // inter-batch slope SD
real<lower=0> sigma; // measurement SD
}
model{
vector[N] mu; // temporary mean
// Priors
beta0 ~ normal(2, 100);
beta1 ~ normal(0, 10);
sigma0 ~ student_t(2, 1, 5);
sigma1 ~ student_t(2, 1, 5);
sigma ~ student_t(2, 1, 5);
// Likelihood for new data
for(i in 1:m){
target += normal_lpdf(b0[i] | beta0,sigma0);
target += normal_lpdf(b1[i] | beta1,sigma1);
}
for(k in 1:M){
mu[k] = b0[batch[k]] + b1[batch[k]]*t[k];
target += normal_lpdf(y[k] | mu[k], sigma );
}
// Power prior based on historical data
// discounted based on a0
for(i in (m+1):n){
target += a0*normal_lpdf(b0[i] | beta0,sigma0);
target += a0*normal_lpdf(b1[i] | beta1,sigma1);
}
for(k in (M+1):N){
mu[k] = b0[batch[k]] + b1[batch[k]]*t[k];
target += a0*normal_lpdf(y[k] | mu[k], sigma );
}
}
generated quantities {
vector[M] log_lik; // calculate log-likelihood (new data values only)
real mu_hat;
for(k in 1:M){
mu_hat = b0[batch[k]] + b1[batch[k]]*t[k]; // generate mu predicted value
log_lik[k] = normal_lpdf(y[k] | mu_hat, sigma );
}
}