To use or not to use the assignement to a hierarchical group as a feature for exact LOO and LOO-PSIS

I want to use leave-one-out (LOO) cross-validation (CV) for a hierarchical model. My intuition is not to use the information on the group assignment of the left-out data point when I compute the likelihood of that point given the model fitted to the rest of the data.
Since, if I would use the group assignment of the left-out data point for LOO-CV, I would treat the group asignment as a explanatory variable (with corresponding parameters having a special, informative prior – the hierarchical prior including information of the data itself). And this is not what I want. I use the information of the group asignement only to account for the fact that the data points aren’t independent given the non-hierarchical model for a known reason (but hopefully are independent given the group assignment), finally to get the variance correct(er).

Questions:

  1. Is this view correct?
  2. Is LOO-PSIS an approximation for the LOO-CV without using the assignment to the hiararchical group of the left-out data point?
  3. Should I use or not use the assignment to the hiararchical group of the left-out data point as a feature when I do exact LOO-CV for the data point where the pareto-k value was too high in the LOO-PSIS (I use the loo package in R.)?

From my understanding of “Vehtari/Gelman/Gabry. 2017. ‘Practical Bayesian Model Evaluation Using Leave-One-out Cross-Validation and WAIC’.” the answer to question 2 is ‘No’. (I understand that \tilde{y}_{i} is the random variable of an unknown data point given features of data point y_{i} in equation (7).) But maybe there is an argument why this is not a problem if I then sum up over all data points.

Let me check I understand your goal. Let’s say you have 10 groups, and the observation y_{37} belongs to a group 8 (ie g_{37}=8) and in addition there are other covariates so that you have a model
p(y_{37} | x_ {37}, g_{37}=8, \phi, \theta_8),
where \phi are common parameters and \theta_8 is group 8 specific parameter. The usual posterior predictive distribution would
\int p(\tilde{y}_{37} | x_ {37}, g_{37}=8, \phi, \theta_8)p(\phi,\theta_8|y,x,g)d\phi\theta.
The usual leave-one-out predictive distribution would be
p(\tilde{y}_{37} | x_ {37}, g_{37}=8, \phi, \theta_8)p(\phi,\theta_8|y_{-37},x_{-37},g_{-37})d\phi\theta.
Do you mean that you would now want to use instead a predictive distribution that is p(\tilde{y}_{37} | x_ {37}, g_{37}=?, \phi, \theta_?)p(\phi,\theta_?|y_{-37},x_{-37},g_{-37})d\phi\theta,
where ? is unknown group among the existing groups, or not yet seen group?

Yes, this is exactly my question, while I only thought of either ? = 8 or ? being a not yet seen group, here. But in a very related question (interposed below) I also thought about ? being an unknown group among the already seen ones.

My understanding of the motivation for a hierarchical model with \theta_i \sim D(\phi) for i = 1, \dots, 10, where D is some distribution, is to learn about the \theta of the general population, i.e. of a random, probably not yet seen group. So, I would want to use the leave-one-out predictive distribution

\int \int p(\tilde{y}_{37} | x_ {37}, g_{37}=?, \phi, \theta_?)p(\phi,\theta_?|y_{-37},x_{-37},g_{-37})d\phi d\theta_{?},

for ? being a random, probably not yet seen group, and I would just integrate over \theta_{?}. Correspondingly, I would draw a random \theta_{?} \sim D(\phi) for computing the exact \texttt{log_lik} in each iteration of my stan program.

This question is related to another question I encountered:
Should one do posterior predictive checks for ? being an unknown group among the existing groups (I did this by forward simulating data for ? = 1,\dots,10 and then pooling.) or for ? being a not yet seen group (I did this by forward simulating data while sampling \theta \sim D(\phi | y, x, g).)? I did both, but I think the second option is what I should do as I want to learn something about the general population and not the groups I have seen. In my application, the second option did match less with the data (I just did a graphic check by plotting histograms of the simulated data and the true data on top of each other.), but I think that this could go in both directions depending on how different the \theta_i's are and on how well D represents them. Does this make sense? What option for the posterior predictive check is recommended? (I don’t yet know the literature very well, sorry. But I try to fill in my gaps.)

I finally read most of your paper “Vehtari, A., Gelman, A. & Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27, 1413–1432 (2017)”[1] and some relevant sections in BDA3. I understand that I can rearrange the data by hierarchical group (making each group a data array considered as single data point) and then do LOO-CV by removing a whole group each time. Hence, the removed group is not yet seen and the (exact) LOO-CV over all groups is an estimate of the elpd for new groups. PSIS-LOO-CV may work as the importance ratios then capture the discrepancy between the likelihood of a given parameter when not knowing this group at all and the likelihood when knowing all data, but as the group-specific parameters are fitted to single groups (hence single data points) the pareto-k estimates is likely to be always too high and exact LOO-CV is probably needed.

This is what I wanted, but I find this solution not ideal as:
I find this LOO-CV more difficult to interprete for situations where the single data points are individuals from a general population about which I want to say something, and the groups are intruduced by the experimental design or the sampling precedure only.
And I think the size of the groups is influencing the LOO-CV in an unwanted way since removing big groups potentially makes the posteriors flatter (I imagine a situation where one big group is situated close to the general population and smaller groups are diverging a bit more.).
Also, in my application I have two layers of grouping, i.e. two grouping aspects I assume to be independent (once the house where the measurement was taken and once the week when it was taken) and I think I would first need to construct single house-week groups.

However, with ‘LOO-CV by group’ described above the ‘not yet seen group’-thing is achieved by removing the ability to learn from the held-out group, while I think it should also be possible to remove the group association of a single held-out data point in the density for that point. More precisely: Assuming a hierarchical model written in terms of group-specific error terms \eta_{j} as

y_{i} \sim \mathcal{D}_{1}(X_{i} \beta + \eta_{j[i]}, \psi) \\ \eta_{j} \sim \mathcal{D}_{2}(\mu, \nu)

with \mathcal{D}_{1} and \mathcal{D}_{2} being some distributions, X some predictor, \beta the corresponding coefficients and j[i] \in \{1, \dots, N \} the group to which data point i belongs, and denoting \theta = (\beta, \psi, \mu, \mu, \eta_1, \dots, \eta_N) with \theta_{-i} = (\beta, \psi, \mu, \mu, \eta_1, \dots, \eta_{i-1}, \dots, \eta_{i+1}, \dots, \eta_N).

I would like to consider the LOO log predictive density

p(\tilde{y}_{i}|y_{-i}) = \int{\tilde{p}(\tilde{y_{i}}|\theta) p(\theta|y_{-i})} d\theta

where \tilde{p} is the density of \tilde{y_{i}} without knowing its group (hence \tilde{p}(\tilde{y_{i}}) = \tilde{p}(\tilde{y})) given by

\tilde{p}(\tilde{y_{i}}|\theta) = \int{p(\tilde{y_{i}}|\theta_{-j[i]}, \tilde{\eta}_{j[i]}) p(\tilde{\eta}_{j[i]}|\mu, \nu)} d \tilde{\eta}_{j[i]}.

In my application \mathcal{D}_{1} is negative binomial and \mathcal{D}_{2} is normal, hence I can write

y_{i} \sim \mathcal{D}_{1}(X_{i} \beta' + \eta_{j[i]} \sigma, \psi) \\ \eta_{j} \sim \mathcal{N}_{2}(0, 1)

and simplify to

\tilde{p}(\tilde{y_{i}}|\theta) = \int_{\mathbb{R}}{p(\tilde{y_{i}}|\theta_{-j[i]}, \tilde{\eta}_{j[i]}) p_{\mathcal{N}(0,1)}(\tilde{\eta}_{j[i]})} d\eta_{j[i]}.

Approximating p(\tilde{y}_{i}|y_{-i}) with PSIS by introducing importance ratios for the second term under the integral should still work. I can easily estimate \tilde{p}(\tilde{y_{i}}|\theta) in each iteration of the stan program (maybe even with only one randomly drawn \tilde{\eta}_{j[i]} \sim \mathcal{D}_{2}(\mu, \nu)), but cannot direclty use \texttt{loo} any more I guess. So, I thought I could output p(y_{i}|\theta) as the usual \texttt{log_lik} but also \tilde{p}(y_{i}|\theta) as \texttt{log_lik_2}, use the function \texttt{psis}() from the \texttt{loo}-package for PSIS, extract the ‘improved’ weights \omega_{i}^{s} by \texttt{weights}() (non-normalised) and then compute \widehat{\widetilde{\text{elpd}}}_{psis-loo} as in equation (10) of your paper [1] while replacing p(y_{i}|\theta^{s}) with the corresponding entry in \texttt{log_lik_2}. For data points with dangerous pareto-k I would do exact LOO-CV and replace the corresponding terms in my hand-made computation of equation (10).

Do you think this is working?

Thank you very much for any help and suggestions!

I implemented in stan/R the idea I outlined in my last post in order to estimate the expected log predictive density with LOO-CV (elpd_loo) for a multilevel/hierarchical model while forgetting the group association of the left-out data point. If anybody is willing to check the method and how I implemented it, I would be very glad!

I insert the stan program and the R script below. The stan program is a bit long, I’m sorry. The model consists of two separately parameterised, analogous negative-binomial models for two kinds of data (indoor and outdoor) measured for the same data points (therefore, almost each variable comes once with ‘IN’ and once with ‘OUT’), whose log-likelihoods are summed up in the end. There are two switches: \texttt{doexactloopdi}=1 for doing exact LOO-CV and \texttt{doforward}=1 for running forward simulation (everything inside \texttt{if (doforward)} in the generated quantities can be ignored for the purpose of LOO-CV). The model runs fine, except for warnings about Rhat being NaN, but I think this is due to some constant variables (I checked once with \texttt{monitor()} that everything is fine.). To me the output looks reasonable. I didn’t simplify the code for time reasons and since this could hide problems in the original code I am not aware of (I am of course also grateful for any other suggestions on my code.). Still, if nobody is responding after a while, I could simplify the code.

The relevant variable for ‘LOO-CV with forgetting group association’ is \texttt{log_lik_2} (sorry there are no line numbers…). When I multiply the predictive density, \texttt{p_tilde}, with the weights (i.e. importance ratios), \texttt{omega}, in the R script, I assume that the order of the sample is the same. This is risky, I think. I know that I can keep the chains separate and un-permuted when extracting \texttt{log_lik_2} from the stanfit object, but I don’t know how to do this for extracting the weights from the psis object (with \texttt{weights.importance_sampling}). Suggestions?

Some background: The data is from a field study comparing different mosquito-control tools deployed at house level with the aim of reducing mosquito densities. The data is mosquito counts per house and per day, indoor and outdoor. There are 12 houses and the study run over 18 weeks, the control and the 3 active treatments are assigned with a Latin square design. The aim of the model is to provide estimates of ‘protective efficacy’ for each treatment, defined as the relative reduction of mosquito counts due to the treatment, which is computed in the generated quantities after forward simulation.

Thank you very much for reading, help, suggestions, comments…

data{
  int<lower=0> n; // number of 'house-days' (experimental days * houses)  // This needs to be adjusted if doexactloopdi=1
  int<lower=0> m[3]; // [number of houses, number of treatments, number of weeks]
  int<lower=0> yHLC[n]; // data HLC
  int<lower=0> yCDC[n]; // data CDC
  int xHOUSE[n]; // feature HOUSE
  int xTREAT[n]; // feature TREATment
  int xWEEK[n]; // feature WEEK
  // to control priors
    int<lower=0> priorscale; // to control scale of prior
  // to control forward simulation
    int<lower=0, upper=1> doforward; // =1 to switch forward simulation on, =0 to switch off
    int zn[doforward ? 1 : 0]; // number of simulated data points
    int zm[doforward ? 3 : 0]; // [number of simulated houses, number of simulated treatments, number of simulated weeks]
    int zxHOUSE[doforward ? zn[1] : 0]; // feature HOUSE
    int zxTREAT[doforward ? zn[1] : 0]; // feature TREATment
    int zxWEEK[doforward ? zn[1] : 0]; //feature week number
    int indexC[doforward ? zm[1]*zm[3] : 0]; // indices of control in output zy
    int indexT[doforward ? zm[1]*zm[3] : 0]; // indices of pull in output zy
    int indexR[doforward ? zm[1]*zm[3] : 0]; // indices of push in output zy
    int indexP[doforward ? zm[1]*zm[3] : 0]; // indices of push-pull in output zy
    real<lower=0> c[doforward ? 1 : 0]; // to correct division by 0, or not if c=0. is added to denominators of generated quantities
  // to control computation of log likelihood for loo 
    int<lower=1> T; // number of sample points for unknown group parameter
    int<lower=0, upper=1> doexactloopdi;
    int<lower=0> yHLCi[doexactloopdi ? 1 : 0]; // HLC of left-out data point i
    int<lower=0> yCDCi[doexactloopdi ? 1 : 0]; // CDC of left-out data point i
    int xHOUSEi[doexactloopdi ? 1 : 0]; // feature HOUSE
    int xTREATi[doexactloopdi ? 1 : 0]; // feature TREATment
    int xWEEKi[doexactloopdi ? 1 : 0]; // feature WEEK
}
parameters{
  real logaIN; // mean of lograte indoors
  real logaOUT; // mean of lograte outdoors
  real<lower=0> sigma_alphaIN; // standard deviation of lograte indoors due to house
  real<lower=0> sigma_alphaOUT; // standard deviation of lograte outdoors due to house
  vector[m[1]] thetaIN; // standardised deviation of indoor lograte from mean of lograte indoors due to house
  vector[m[1]] thetaOUT; // standardised deviation of outdoor lograte from mean of lograte outdoors due to house
  real<lower=0> sigma_tauIN; // standard deviation of lograte indoors due to week
  real<lower=0> sigma_tauOUT; // standard deviation of lograte outdoors due to week
  vector[m[3]] tauIN; // standardised deviation of indoor lograte from mean of lograte indoors due to week
  vector[m[3]] tauOUT; // standardised deviation of outdoor lograte from mean of lograte outdoors due to week
  vector[m[2]-1] logbetaIN; // treatment effect indoor 
  vector[m[2]-1] logbetaOUT; // treatment effect outdoor
  vector[m[2]] logpsiIN; // one scale parameter per treatment group and per indoor, so far no hierarchy
  vector[m[2]] logpsiOUT; // one scale parameter per treatment group and per outdoor, so far no hierarchy
}
transformed parameters{
  // parameters for negative binomial for data
    vector[n] loglambdaIN; // lograte indoor
    vector[n] loglambdaOUT; // lograte indoor
    vector[n] phiIN; // scale parameter indoor 
    vector[n] phiOUT; // scale parameter outdoor
    // defining log means
    vector[4] logbetaINall = append_row(0, logbetaIN);
    vector[4] logbetaOUTall = append_row(0, logbetaOUT);
    loglambdaIN = logaIN + thetaIN[xHOUSE] * sigma_alphaIN + logbetaINall[xTREAT] + tauIN[xWEEK] * sigma_tauIN;
    loglambdaOUT = logaOUT + thetaOUT[xHOUSE] * sigma_alphaOUT + logbetaOUTall[xTREAT] + tauOUT[xWEEK] * sigma_tauOUT;
    // defining scale parameters
    phiIN = exp(logpsiIN[xTREAT]); 
    phiOUT = exp(logpsiOUT[xTREAT]); 
}
model{
//priors and hyperpriors
  target += normal_lpdf(logaIN | 0, priorscale);
  target += normal_lpdf(logaOUT | 0, priorscale);
  target += 2*cauchy_lpdf(sigma_alphaIN | 0,priorscale);
  target += 2*cauchy_lpdf(sigma_alphaOUT | 0,priorscale);
  target += 2*cauchy_lpdf(sigma_tauIN | 0,priorscale);
  target += 2*cauchy_lpdf(sigma_tauOUT | 0,priorscale);
  target += normal_lpdf(logbetaIN| 0, priorscale);
  target += normal_lpdf(logbetaOUT| 0, priorscale);
  target += normal_lpdf(logpsiIN| 0, priorscale);
  target += normal_lpdf(logpsiOUT| 0, priorscale);
// hierarchical parameters  
  target += normal_lpdf(thetaIN| 0, 1);
  target += normal_lpdf(thetaOUT| 0, 1); 
  target += normal_lpdf(tauIN| 0, 1);
  target += normal_lpdf(tauOUT| 0, 1);
// model
  target += neg_binomial_2_log_lpmf(yCDC | loglambdaIN, phiIN); 
  target += neg_binomial_2_log_lpmf(yHLC | loglambdaOUT, phiOUT); 
}
generated quantities{
  // definitions for log likelihood for loo 
    vector[doexactloopdi ? 1 : n] log_lik;
    vector[doexactloopdi ? 1 : n] log_lik_2;
    vector[doexactloopdi ? 1 : 0] loglambdaINi; // lograte indoor
    vector[doexactloopdi ? 1 : 0] loglambdaOUTi; // lograte indoor
    vector[doexactloopdi ? T : 0] loglambdaINi_rn; // lograte indoor
    vector[doexactloopdi ? T : 0] loglambdaOUTi_rn; // lograte indoor
    vector[doexactloopdi ? 1 : 0] phiINi; // scale parameter indoor 
    vector[doexactloopdi ? 1 : 0] phiOUTi; // scale parameter outdoor  
    vector[doexactloopdi ? T : 0] thetaINi_rn; // random numbers
    vector[doexactloopdi ? T : 0] thetaOUTi_rn; // random numbers  
    vector[doexactloopdi ? T : 0] tauINi_rn; // random numbers  
    vector[doexactloopdi ? T : 0] tauOUTi_rn; // random numbers  
    matrix[doexactloopdi ? 0 : n,T] loglambdaIN_rn; // random group 
    matrix[doexactloopdi ? 0 : n,T] loglambdaOUT_rn; // random group  
    matrix[doexactloopdi ? 0 : n,T] thetaIN_rn; // random numbers
    matrix[doexactloopdi ? 0 : n,T] thetaOUT_rn; // random numbers  
    matrix[doexactloopdi ? 0 : n,T] tauIN_rn; // random numbers  
    matrix[doexactloopdi ? 0 : n,T] tauOUT_rn; // random numbers  
  // definitions for forward simulation
    int<lower=0> zIN[doforward ? zn[1] : 0]; //simulate CDC data for known houses and unknown house and all treatment groups
    int<lower=0> zOUT[doforward ? zn[1] : 0]; //simulate HLC data for known houses and unknown house and all treatment groups
    vector[doforward ? zn[1] : 0] zmeanIN; //mean of simulated CDC data for known houses and unknown house and all treatment groups
    vector[doforward ? zn[1] : 0] zmeanOUT; //mean of simulated HLC data for known houses and unknown house and all treatment groups
    //vector[doforward ? zm[1] : 0] rrIN_OUT; // rates rario of indoor vs outdoor in control setting
    //vector[doforward ? zm[1] : 0] SpIN; // simulated probability of mosquito going indoor in control setting
    //vector[doforward ? zm[1] : 0] EpIN; // expected probability of mosquito going indoor in control setting
    vector[doforward ? zn[1] : 0] PEf_IN; // protective efficacy against indoor biting in the field
    vector[doforward ? zn[1] : 0] PEf_OUT; // protective efficacy against outdoor biting in the field
  // expressions for log likelihood per datapoint to be used by loo function
    if (doexactloopdi){
      //log_lik
        loglambdaINi = logaIN + thetaIN[xHOUSEi] * sigma_alphaIN + logbetaINall[xTREATi] + tauIN[xWEEKi] * sigma_tauIN;
        loglambdaOUTi = logaOUT + thetaOUT[xHOUSEi] * sigma_alphaOUT + logbetaOUTall[xTREATi] + tauOUT[xWEEKi] * sigma_tauOUT;
        phiINi = exp(logpsiIN[xTREATi]); 
        phiOUTi = exp(logpsiOUT[xTREATi]);
        log_lik[1] = neg_binomial_2_log_lpmf(yCDCi | loglambdaINi, phiINi) + neg_binomial_2_log_lpmf(yHLCi | loglambdaOUTi, phiOUTi);
      //log_lik_2
        thetaINi_rn = to_vector(normal_rng(rep_vector(0,T),rep_vector(1,T)));
        thetaOUTi_rn = to_vector(normal_rng(rep_vector(0,T),rep_vector(1,T)));
        tauINi_rn = to_vector(normal_rng(rep_vector(0,T),rep_vector(1,T)));
        tauOUTi_rn = to_vector(normal_rng(rep_vector(0,T),rep_vector(1,T)));
        loglambdaINi_rn = logaIN + thetaINi_rn * sigma_alphaIN + rep_vector(to_array_1d(logbetaINall[xTREATi])[1],T) + tauINi_rn * sigma_tauIN;
        loglambdaOUTi_rn = logaOUT + thetaOUTi_rn * sigma_alphaOUT + rep_vector(to_array_1d(logbetaOUTall[xTREATi])[1],T) + tauOUTi_rn * sigma_tauOUT;
        log_lik_2[1] = ( neg_binomial_2_log_lpmf(rep_array(yCDCi[1],T) | loglambdaINi_rn, rep_vector(phiINi[1],T)) + neg_binomial_2_log_lpmf(rep_array(yHLCi[1],T) | loglambdaOUTi_rn, rep_vector(phiOUTi[1],T)) )/ T;
    }else{
      thetaIN_rn = to_matrix(normal_rng(rep_vector(0,n*T),rep_vector(1,n*T)),n,T);
      thetaOUT_rn = to_matrix(normal_rng(rep_vector(0,n*T),rep_vector(1,n*T)),n,T);
      tauIN_rn = to_matrix(normal_rng(rep_vector(0,n*T),rep_vector(1,n*T)),n,T);
      tauOUT_rn = to_matrix(normal_rng(rep_vector(0,n*T),rep_vector(1,n*T)),n,T);
      loglambdaIN_rn = logaIN + thetaIN_rn * sigma_alphaIN + rep_matrix(logbetaINall[xTREAT],T) + tauIN_rn * sigma_tauIN;
      loglambdaOUT_rn = logaOUT + thetaOUT_rn * sigma_alphaOUT + rep_matrix(logbetaOUTall[xTREAT],T) + tauOUT_rn * sigma_tauOUT;
      for (j in 1:n) {
      // log_lik for PSIS
      log_lik[j] = neg_binomial_2_log_lpmf(yCDC[j] | loglambdaIN[j], phiIN[j]) + neg_binomial_2_log_lpmf(yHLC[j] | loglambdaOUT[j], phiOUT[j]);
      // log_lik_2 for PSIS with forgetting group association of held-out data point
      log_lik_2[j] = ( neg_binomial_2_log_lpmf(rep_array(yCDC[j],T) | loglambdaIN_rn[j,], rep_vector(phiIN[j],T)) + neg_binomial_2_log_lpmf(rep_array(yHLC[j],T) | loglambdaOUT_rn[j,], rep_vector(phiOUT[j],T)) )/ T;
      }
    }
  // expressions for forward simulation and protective efficacy
  if (doforward){
    vector[zm[1]] zthetaIN; // standardised deviation of indoor lograte from mean of lograte indoors, plus unknown 13th house
    vector[zm[1]] zthetaOUT; // standardised deviation of outdoor lograte from mean of lograte outdoors, plus unknown 13th house
    vector[zm[3]] ztauIN; // standardised deviation of indoor lograte from mean of lograte indoors due to week, plus unknown week
    vector[zm[3]] ztauOUT; // standardised deviation of outdoor lograte from mean of lograte outdoors due to week, plus unknown week
    vector[zn[1]] zloglambdaIN; // lograte indoor
    vector[zn[1]] zloglambdaOUT; // lograte indoor
    vector[zn[1]] zphiIN; // scale parameter indoor
    vector[zn[1]] zphiOUT; // scale parameter outdoor
    int zINC[zn[1]];
    int zOUTC[zn[1]];
    zthetaIN[:m[1]] = thetaIN;
    zthetaOUT[:m[1]]= thetaOUT;
    zthetaIN[zm[1]] = normal_rng(0,1);
    zthetaOUT[zm[1]]= normal_rng(0,1);
    ztauIN[:m[3]] = tauIN;
    ztauOUT[:m[3]]= tauOUT;
    ztauIN[zm[3]] = normal_rng(0,1);
    ztauOUT[zm[3]]= normal_rng(0,1);
    // defining log means 
      zloglambdaIN = logaIN + zthetaIN[zxHOUSE] * sigma_alphaIN + logbetaINall[zxTREAT] + ztauIN[zxWEEK] * sigma_tauIN;
      zloglambdaOUT = logaOUT + zthetaOUT[zxHOUSE] * sigma_alphaOUT + logbetaOUTall[zxTREAT] + ztauOUT[zxWEEK] * sigma_tauOUT;
    zphiIN = exp(logpsiIN[zxTREAT]); 
    zphiOUT = exp(logpsiOUT[zxTREAT]); 
    zIN = neg_binomial_2_log_rng(zloglambdaIN, zphiIN); //simulate CDC data
    zOUT = neg_binomial_2_log_rng(zloglambdaOUT, zphiOUT); //simulate HLC data
    zmeanIN = exp(zloglambdaIN); //mean of simulated CDC data for known houses and unknown house and all treatment groups
    zmeanOUT = exp(zloglambdaOUT); //mean of simulated HLC data for known houses and unknown house and all treatment groups
    //rrIN_OUT = exp(logalphaIN) ./ exp(logalphaOUT);
    //SpIN = to_vector(neg_binomial_2_log_rng(logalphaIN + logalphaOUT, exp(logpsiIN[1]))) ./ (to_vector(neg_binomial_2_log_rng(logalphaIN + logalphaOUT, exp(logpsiIN[1]))) + rep_vector(0.1,12)); // the last term is correction to preclude division by zero. Rethink about this.
    { // build output vectors with controls only
      zINC[indexC] = zIN[indexC];
      zINC[indexT] = zIN[indexC];
      zINC[indexR] = zIN[indexC];
      zINC[indexP] = zIN[indexC];
      //
      zOUTC[indexC] = zOUT[indexC];
      zOUTC[indexT] = zOUT[indexC];
      zOUTC[indexR] = zOUT[indexC];
      zOUTC[indexP] = zOUT[indexC];
    }
    PEf_IN= (to_vector(zINC) - to_vector(zIN) ) ./ (to_vector(zINC)+c[1]);
    PEf_OUT= (to_vector(zOUTC) - to_vector(zOUT) ) ./ (to_vector(zOUTC)+c[1]);
  }
}

This is the R function I wrote to run the model and compute the two LOO-CV versions:

validfit <- function(species, model, no_zHOUSE, no_zWEEK, WEEK2DAY, doforward, doexactloopdi, T_est, pareto_k_threshold, chains, warmup, iter, savetofile) {
  # This function return a list containing the stanfit object for the given model fitted to the whole data, the matrix defining the forward simulations if doforward==1, the loo() output for this with PSIS-approximated elpd (expected log posterior density) replaced by the exact elpd for the data points with pareto-k value above pareto_k_threshold, elpd_tilde_loo_psis which is forgetting the group association of the left-out data point (again with exact loo if the pareto-k value is above pareto_k_threshold), SE_elpd_tilde_loo_psis which is the standard error of the latter, and a vector with the indices of the data points with pareto-k value above pareto-k_threshold
  # pareto_k_threshold = 0.5 or at most 0.7 is recommended
  #species is currently either 'gamb' or 'fune'
  
  ## preparations
  if(chains <= 1) {stop("needs multiple chains")}
  
  options(mc.cores = parallel::detectCores())
  library('rstan')
  rstan_options(auto_write = TRUE)
  library("loo")
  
  model_file <- paste0("./", model, ".stan", collapse = NULL)
  m <- stan_model(file = model_file)  # Stan program
  
  ## fit model to whole data
  input_stan <- data_stan(species=species, doexactloopdi=0, T_est, index=NA, doforward=doforward, no_zHOUSE=no_zHOUSE, no_zWEEK=no_zWEEK, WEEK2DAY)
  sim <- input_stan$sim
  fit <- sampling(
    m, 
    data = input_stan,    # named list of data
    chains = chains,      # number of Markov chains
    warmup = warmup,      # number of warmup iterations per chain
    iter = iter,          # total number of iterations per chain
    control = list(adapt_delta = 0.8),
    show_messages = FALSE
  )
  
  ## LOO-CV by PSIS with using group association of left-out data point for prediction
    log_lik <- loo::extract_log_lik(fit, merge_chains = FALSE) # Extract pointwise log-likelihood
    
    # as of loo v2.0.0 we can optionally provide relative effective sample sizes
    # when calling loo, which allows for better estimates of the PSIS effective
    # sample sizes and Monte Carlo error
    r_eff <- relative_eff(exp(log_lik))
    loo_ <- loo(fit, r_eff = r_eff, save_psis=TRUE) # CHECK ORDERING OF ELEMENTS IN PSIS_OBJECT IN TERMS OF SAMPLE
    I <- pareto_k_ids(loo_, threshold = pareto_k_threshold) # extracting the indices of data points with pareto-k estimate above pareto_k_threshold
  
  
  ## LOO-CV by PSIS with forgetting group association, storing pointwise, sum and SE to be computed further down
    omega <- loo::weights.importance_sampling(loo_$psis_object, log=FALSE, normalize = TRUE) # check what is meant with log scale and if normalisation is done on the right scale
    log_p_tilde <- rstan::extract(object=fit, pars='log_lik_2', permuted = TRUE, inc_warmup = FALSE, include = TRUE)
    p_tilde <- exp(log_p_tilde$log_lik_2)
    elpd_tilde_loo_psis_pointwise <- log(colSums(omega * p_tilde)) # sum instead of mean as weights omega are normalised
  
  ## exact LOO-CV for data points with pareto-k estimate above pareto_k_threshold by PSIS
  if (doexactloopdi==1){
  # refitting model #I times with each time leaving out one i in I and saving corresponding exact expected log posterior density (elpd), once with using group association of left-out data point and once with forgetting.
    for (i in I) {
      input_stan <- data_stan(species=species, doexactloopdi=1, T_est, index=i, doforward=0, no_zHOUSE=NA, no_zWEEK=NA, WEEK2DAY) # calling function saved in load_data.R
      fit_lio <- sampling(
        m, 
        data = input_stan,    # named list of data
        chains = chains,      # number of Markov chains
        warmup = warmup,      # number of warmup iterations per chain
        iter = iter,          # total number of iterations per chain
        control = list(adapt_delta = 0.8),
        show_messages = FALSE
      )
      
      # replace pointwise log_lik
      log_lik_lio_i <- loo::extract_log_lik(fit_lio, merge_chains = FALSE) # extract log likelihood for data point i
      loo_$pointwise[i,1] <- log(mean(exp(log_lik_lio_i))) # replace by exact_elpd for left out data point i CHECK IF THIS REALLY CHANGES THE OUTPUT OF loo_
      
      # replace pointwise elpd_tilde_loo_psis with exact elpd_tilde_loo_psis
      log_p_tilde_lio <- rstan::extract(object=fit_lio, pars='log_lik_2', permuted = TRUE, inc_warmup = FALSE, include = TRUE)
      p_tilde_lio <- exp(log_p_tilde_lio$log_lik_2)
      elpd_tilde_loo_psis_pointwise[i] <- log(mean(p_tilde_lio))
    }
  }
      
  # compute elpd_tilde_loo_psis, the elpd when forgetting the group association of the left-out data point
  elpd_tilde_loo_psis <- sum(elpd_tilde_loo_psis_pointwise)
  SE_elpd_tilde_loo_psis <- sd(elpd_tilde_loo_psis_pointwise)*sqrt(length(elpd_tilde_loo_psis_pointwise))
  
  ## outputting results  
  if (doforward==1){
    result <- list(fit=fit, sim=sim, loo_=loo_, I=I, elpd_tilde_loo_psis=elpd_tilde_loo_psis, SE_elpd_tilde_loo_psis=SE_elpd_tilde_loo_psis)
  }else{
    result <- list(fit=fit, loo_=loo_, I=I, elpd_tilde_loo_psis=elpd_tilde_loo_psis, SE_elpd_tilde_loo_psis=SE_elpd_tilde_loo_psis)
  }
  
  ## saving results
  if (is.na(savetofile)!=1){
    savetopath_fit <- paste0("./", savetofile, "_fit.rds")
    saveRDS(fit, file=savetopath_fit)
    savetopath_loo <- paste0("./", savetofile, "_loo.rds")
    saveRDS(list(loo_=loo_, I=I, elpd_tilde_loo_psis=elpd_tilde_loo_psis, SE_elpd_tilde_loo_psis=SE_elpd_tilde_loo_psis), file=savetopath_loo)
    if (doforward==1){
      savesimtopath <- paste0("./", savetofile, "_sim.rds")
      saveRDS(input_stan$sim, file=savesimtopath)
    }
  }
  
return(result)
}

I just saw that the estimate for ‘elpd when forgetting the group association’ by the above code is very sensitive to the sample size (\texttt{T} in the code) for estimating the ‘log density when forgetting the group association’ (\texttt{log_lik_2} in the code). I thought that I don’t need a big sample here because I thought the ‘integration’ over the posterior would average this out. But I tried for T=1,T=3 and T=20 with very different results (elpd between -3685 and -4374). And then I needed to stop because already this used up to about 20G RAM per chain! So, I will change to take the same random group-specific parameters for each data point and see where I can further optimise the code in terms of memory. Or, there is another bug…

I fixed the the memory use issue mentioned above (same random sample of group-specific parameter for each data point, local variables not saved to stan output). The new code is below. For my application the elpd while forgetting group association was more or less stable for T=500 in the quick tests I run, elpd was in the range of -4480 with +/- 5.


data{
  int<lower=0> n; // number of 'house-days' (experimental days * houses)  // This needs to be adjusted if doexactloopdi=1
  int<lower=0> m[3]; // [number of houses, number of treatments, number of weeks]
  int<lower=0> yHLC[n]; // data HLC
  int<lower=0> yCDC[n]; // data CDC
  int xHOUSE[n]; // feature HOUSE
  int xTREAT[n]; // feature TREATment
  int xWEEK[n]; // feature WEEK
  // to control priors
    int<lower=0> priorscale; // to control scale of prior
  // to control forward simulation
    int<lower=0, upper=1> doforward; // =1 to switch forward simulation on, =0 to switch off
    int zn[doforward ? 1 : 0]; // number of simulated data points
    int zm[doforward ? 3 : 0]; // [number of simulated houses, number of simulated treatments, number of simulated weeks]
    int zxHOUSE[doforward ? zn[1] : 0]; // feature HOUSE
    int zxTREAT[doforward ? zn[1] : 0]; // feature TREATment
    int zxWEEK[doforward ? zn[1] : 0]; //feature week number
    int indexC[doforward ? zm[1]*zm[3] : 0]; // indices of control in output zy
    int indexT[doforward ? zm[1]*zm[3] : 0]; // indices of pull in output zy
    int indexR[doforward ? zm[1]*zm[3] : 0]; // indices of push in output zy
    int indexP[doforward ? zm[1]*zm[3] : 0]; // indices of push-pull in output zy
    real<lower=0> c[doforward ? 1 : 0]; // to correct division by 0, or not if c=0. is added to denominators of generated quantities
  // to control computation of log likelihood for loo 
    int<lower=1> T; // number of sample points for unknown group parameter
    int<lower=0, upper=1> doexactloopdi;
    int<lower=0> yHLCi[doexactloopdi ? 1 : 0]; // HLC of left-out data point i
    int<lower=0> yCDCi[doexactloopdi ? 1 : 0]; // CDC of left-out data point i
    int xHOUSEi[doexactloopdi ? 1 : 0]; // feature HOUSE
    int xTREATi[doexactloopdi ? 1 : 0]; // feature TREATment
    int xWEEKi[doexactloopdi ? 1 : 0]; // feature WEEK
}
parameters{
  real logaIN; // mean of lograte indoors
  real logaOUT; // mean of lograte outdoors
  real<lower=0> sigma_alphaIN; // standard deviation of lograte indoors due to house
  real<lower=0> sigma_alphaOUT; // standard deviation of lograte outdoors due to house
  vector[m[1]] thetaIN; // standardised deviation of indoor lograte from mean of lograte indoors due to house
  vector[m[1]] thetaOUT; // standardised deviation of outdoor lograte from mean of lograte outdoors due to house
  real<lower=0> sigma_tauIN; // standard deviation of lograte indoors due to week
  real<lower=0> sigma_tauOUT; // standard deviation of lograte outdoors due to week
  vector[m[3]] tauIN; // standardised deviation of indoor lograte from mean of lograte indoors due to week
  vector[m[3]] tauOUT; // standardised deviation of outdoor lograte from mean of lograte outdoors due to week
  vector[m[2]-1] logbetaIN; // treatment effect indoor 
  vector[m[2]-1] logbetaOUT; // treatment effect outdoor
  vector[m[2]] logpsiIN; // one scale parameter per treatment group and per indoor, so far no hierarchy
  vector[m[2]] logpsiOUT; // one scale parameter per treatment group and per outdoor, so far no hierarchy
}
transformed parameters{
  // parameters for negative binomial for data
    vector[n] loglambdaIN; // lograte indoor
    vector[n] loglambdaOUT; // lograte indoor
    vector[n] phiIN; // scale parameter indoor 
    vector[n] phiOUT; // scale parameter outdoor
    // defining log means
    vector[4] logbetaINall = append_row(0, logbetaIN);
    vector[4] logbetaOUTall = append_row(0, logbetaOUT);
    loglambdaIN = logaIN + thetaIN[xHOUSE] * sigma_alphaIN + logbetaINall[xTREAT] + tauIN[xWEEK] * sigma_tauIN;
    loglambdaOUT = logaOUT + thetaOUT[xHOUSE] * sigma_alphaOUT + logbetaOUTall[xTREAT] + tauOUT[xWEEK] * sigma_tauOUT;
    // defining scale parameters
    phiIN = exp(logpsiIN[xTREAT]); 
    phiOUT = exp(logpsiOUT[xTREAT]); 
}
model{
//priors and hyperpriors
  target += normal_lpdf(logaIN | 0, priorscale);
  target += normal_lpdf(logaOUT | 0, priorscale);
  target += 2*cauchy_lpdf(sigma_alphaIN | 0,priorscale);
  target += 2*cauchy_lpdf(sigma_alphaOUT | 0,priorscale);
  target += 2*cauchy_lpdf(sigma_tauIN | 0,priorscale);
  target += 2*cauchy_lpdf(sigma_tauOUT | 0,priorscale);
  target += normal_lpdf(logbetaIN| 0, priorscale);
  target += normal_lpdf(logbetaOUT| 0, priorscale);
  target += normal_lpdf(logpsiIN| 0, priorscale);
  target += normal_lpdf(logpsiOUT| 0, priorscale);
// hierarchical parameters  
  target += normal_lpdf(thetaIN| 0, 1);
  target += normal_lpdf(thetaOUT| 0, 1); 
  target += normal_lpdf(tauIN| 0, 1);
  target += normal_lpdf(tauOUT| 0, 1);
// model
  target += neg_binomial_2_log_lpmf(yCDC | loglambdaIN, phiIN); 
  target += neg_binomial_2_log_lpmf(yHLC | loglambdaOUT, phiOUT); 
}
generated quantities{
  // declarations for log likelihood for loo 
    vector[doexactloopdi ? 1 : n] log_lik;
    vector[doexactloopdi ? 1 : n] log_lik_2;
  // declarations for forward simulation
    int<lower=0> zIN[doforward ? zn[1] : 0]; //simulate CDC data for known houses and unknown house and all treatment groups
    int<lower=0> zOUT[doforward ? zn[1] : 0]; //simulate HLC data for known houses and unknown house and all treatment groups
    vector[doforward ? zn[1] : 0] zmeanIN; //mean of simulated CDC data for known houses and unknown house and all treatment groups
    vector[doforward ? zn[1] : 0] zmeanOUT; //mean of simulated HLC data for known houses and unknown house and all treatment groups
    //vector[doforward ? zm[1] : 0] rrIN_OUT; // rates rario of indoor vs outdoor in control setting
    //vector[doforward ? zm[1] : 0] SpIN; // simulated probability of mosquito going indoor in control setting
    //vector[doforward ? zm[1] : 0] EpIN; // expected probability of mosquito going indoor in control setting
    vector[doforward ? zn[1] : 0] PEf_IN; // protective efficacy against indoor biting in the field
    vector[doforward ? zn[1] : 0] PEf_OUT; // protective efficacy against outdoor biting in the field
  // expressions for log likelihood per datapoint to be used by loo function
    if (doexactloopdi){
      // local variables declaration
      vector[doexactloopdi ? 1 : 0] loglambdaINi; // lograte indoor
      vector[doexactloopdi ? 1 : 0] loglambdaOUTi; // lograte indoor
      vector[doexactloopdi ? T : 0] loglambdaINi_rn; // lograte indoor
      vector[doexactloopdi ? T : 0] loglambdaOUTi_rn; // lograte indoor
      vector[doexactloopdi ? 1 : 0] phiINi; // scale parameter indoor 
      vector[doexactloopdi ? 1 : 0] phiOUTi; // scale parameter outdoor  
      vector[doexactloopdi ? T : 0] thetaINi_rn; // random numbers
      vector[doexactloopdi ? T : 0] thetaOUTi_rn; // random numbers  
      vector[doexactloopdi ? T : 0] tauINi_rn; // random numbers  
      vector[doexactloopdi ? T : 0] tauOUTi_rn; // random numbers  
      //log_lik
        loglambdaINi = logaIN + thetaIN[xHOUSEi] * sigma_alphaIN + logbetaINall[xTREATi] + tauIN[xWEEKi] * sigma_tauIN;
        loglambdaOUTi = logaOUT + thetaOUT[xHOUSEi] * sigma_alphaOUT + logbetaOUTall[xTREATi] + tauOUT[xWEEKi] * sigma_tauOUT;
        phiINi = exp(logpsiIN[xTREATi]); 
        phiOUTi = exp(logpsiOUT[xTREATi]);
        log_lik[1] = neg_binomial_2_log_lpmf(yCDCi | loglambdaINi, phiINi) + neg_binomial_2_log_lpmf(yHLCi | loglambdaOUTi, phiOUTi);
      //log_lik_2
        thetaINi_rn = to_vector(normal_rng(rep_vector(0,T),rep_vector(1,T)));
        thetaOUTi_rn = to_vector(normal_rng(rep_vector(0,T),rep_vector(1,T)));
        tauINi_rn = to_vector(normal_rng(rep_vector(0,T),rep_vector(1,T)));
        tauOUTi_rn = to_vector(normal_rng(rep_vector(0,T),rep_vector(1,T)));
        loglambdaINi_rn = logaIN + thetaINi_rn * sigma_alphaIN + rep_vector(to_array_1d(logbetaINall[xTREATi])[1],T) + tauINi_rn * sigma_tauIN;
        loglambdaOUTi_rn = logaOUT + thetaOUTi_rn * sigma_alphaOUT + rep_vector(to_array_1d(logbetaOUTall[xTREATi])[1],T) + tauOUTi_rn * sigma_tauOUT;
        log_lik_2[1] = ( neg_binomial_2_log_lpmf(rep_array(yCDCi[1],T) | loglambdaINi_rn, rep_vector(phiINi[1],T)) + neg_binomial_2_log_lpmf(rep_array(yHLCi[1],T) | loglambdaOUTi_rn, rep_vector(phiOUTi[1],T)) )/ T;
    }else{
      // local variables declaration 
      vector[doexactloopdi ? 0 : T] loglambdaIN_rn; // random group 
      vector[doexactloopdi ? 0 : T] loglambdaOUT_rn; // random group  
      vector[doexactloopdi ? 0 : T] thetaIN_rn; // random numbers
      vector[doexactloopdi ? 0 : T] thetaOUT_rn; // random numbers  
      vector[doexactloopdi ? 0 : T] tauIN_rn; // random numbers  
      vector[doexactloopdi ? 0 : T] tauOUT_rn; // random numbers 
      // expressions
      thetaIN_rn = to_vector(normal_rng(rep_vector(0,T),rep_vector(1,T)));
      thetaOUT_rn = to_vector(normal_rng(rep_vector(0,T),rep_vector(1,T)));
      tauIN_rn = to_vector(normal_rng(rep_vector(0,T),rep_vector(1,T)));
      tauOUT_rn = to_vector(normal_rng(rep_vector(0,T),rep_vector(1,T)));
      loglambdaIN_rn = logaIN + thetaIN_rn * sigma_alphaIN + tauIN_rn * sigma_tauIN;
      loglambdaOUT_rn = logaOUT + thetaOUT_rn * sigma_alphaOUT + tauOUT_rn * sigma_tauOUT;
      for (j in 1:n) {
      // log_lik for PSIS
      log_lik[j] = neg_binomial_2_log_lpmf(yCDC[j] | loglambdaIN[j], phiIN[j]) + neg_binomial_2_log_lpmf(yHLC[j] | loglambdaOUT[j], phiOUT[j]);
      // log_lik_2 for PSIS with forgetting group association of held-out data point
      log_lik_2[j] = ( neg_binomial_2_log_lpmf(rep_array(yCDC[j],T) | loglambdaIN_rn+logbetaINall[xTREAT[j]], rep_vector(phiIN[j],T)) + neg_binomial_2_log_lpmf(rep_array(yHLC[j],T) | loglambdaOUT_rn+logbetaOUTall[xTREAT[j]], rep_vector(phiOUT[j],T)) )/ T;
      }
    }
  // expressions for forward simulation and protective efficacy
  if (doforward){
    vector[zm[1]] zthetaIN; // standardised deviation of indoor lograte from mean of lograte indoors, plus unknown 13th house
    vector[zm[1]] zthetaOUT; // standardised deviation of outdoor lograte from mean of lograte outdoors, plus unknown 13th house
    vector[zm[3]] ztauIN; // standardised deviation of indoor lograte from mean of lograte indoors due to week, plus unknown week
    vector[zm[3]] ztauOUT; // standardised deviation of outdoor lograte from mean of lograte outdoors due to week, plus unknown week
    vector[zn[1]] zloglambdaIN; // lograte indoor
    vector[zn[1]] zloglambdaOUT; // lograte indoor
    vector[zn[1]] zphiIN; // scale parameter indoor
    vector[zn[1]] zphiOUT; // scale parameter outdoor
    int zINC[zn[1]];
    int zOUTC[zn[1]];
    zthetaIN[:m[1]] = thetaIN;
    zthetaOUT[:m[1]]= thetaOUT;
    zthetaIN[zm[1]] = normal_rng(0,1);
    zthetaOUT[zm[1]]= normal_rng(0,1);
    ztauIN[:m[3]] = tauIN;
    ztauOUT[:m[3]]= tauOUT;
    ztauIN[zm[3]] = normal_rng(0,1);
    ztauOUT[zm[3]]= normal_rng(0,1);
    // defining log means 
      zloglambdaIN = logaIN + zthetaIN[zxHOUSE] * sigma_alphaIN + logbetaINall[zxTREAT] + ztauIN[zxWEEK] * sigma_tauIN;
      zloglambdaOUT = logaOUT + zthetaOUT[zxHOUSE] * sigma_alphaOUT + logbetaOUTall[zxTREAT] + ztauOUT[zxWEEK] * sigma_tauOUT;
    zphiIN = exp(logpsiIN[zxTREAT]); 
    zphiOUT = exp(logpsiOUT[zxTREAT]); 
    zIN = neg_binomial_2_log_rng(zloglambdaIN, zphiIN); //simulate CDC data
    zOUT = neg_binomial_2_log_rng(zloglambdaOUT, zphiOUT); //simulate HLC data
    zmeanIN = exp(zloglambdaIN); //mean of simulated CDC data for known houses and unknown house and all treatment groups
    zmeanOUT = exp(zloglambdaOUT); //mean of simulated HLC data for known houses and unknown house and all treatment groups
    //rrIN_OUT = exp(logalphaIN) ./ exp(logalphaOUT);
    //SpIN = to_vector(neg_binomial_2_log_rng(logalphaIN + logalphaOUT, exp(logpsiIN[1]))) ./ (to_vector(neg_binomial_2_log_rng(logalphaIN + logalphaOUT, exp(logpsiIN[1]))) + rep_vector(0.1,12)); // the last term is correction to preclude division by zero. Rethink about this.
    { // build output vectors with controls only
      zINC[indexC] = zIN[indexC];
      zINC[indexT] = zIN[indexC];
      zINC[indexR] = zIN[indexC];
      zINC[indexP] = zIN[indexC];
      //
      zOUTC[indexC] = zOUT[indexC];
      zOUTC[indexT] = zOUT[indexC];
      zOUTC[indexR] = zOUT[indexC];
      zOUTC[indexP] = zOUT[indexC];
    }
    PEf_IN= (to_vector(zINC) - to_vector(zIN) ) ./ (to_vector(zINC)+c[1]);
    PEf_OUT= (to_vector(zOUTC) - to_vector(zOUT) ) ./ (to_vector(zOUTC)+c[1]);
  }
}

1 Like