Loo: calculating point-wise log-lik when using data augmentation for censored observations

Hi Stanimals,

I have a bit of a conceptual question that I’m thinking of before endeavoring on using the loo package. Briefly, I’m trying to figure out how to specify the point-wise log-likelihoods in the generated quantities block (to be analysed post-hoc with the loo package in R) when my model includes data augmentation for censored observations. More details below.

I’m working with a dataset where a subset of observations are left-censored. More specifically, I (assume that I) know the observation is under some threshold concentration (a detection limit of a lab assay) but I don’t know where between zero and that threshold. My model does however predict a specific concentration. For computational efficiency, I use data augmentation to model left-censored observations, meaning that for each censored observation, I define a parameter y_log_true in the parameter block that is bounded to be <=log(threshold). Then, in the model block, I include a statement y_log_true ~ normal(y_log_hat, sigma);, where y_log_hat is the model-predicted concentration for that data-point and sigma is the measurement error of the lab assay. Works like a charm and I find it is more computationally efficient than integrating out the observation with a CDF.

Now for calculating the point-wise log-likelihoods (to be used with loo later on), for censored data points, should I calculate the log-likelihood as either:

  1. the log-probability of the observation being censored, given expectation y_log_hat and the measurement error sigma (using the CDF as one would do in the model block when integrating out censored observations), or
  2. the log-probability density equivalent to the sampling statement in the model block y_log_true ~ normal(y_log_hat, sigma); (including normalising constants, of course)?

Cheers!

2 Likes

The model particulars are described in another post here, although I don’t think they are particularly relevant to the question at hand.

The only thing I can think of is that for some censored data points, the log-lik will be zero or very close to zero (probability 1.0) for most if not all samples. Not sure whether that will be a problem for the loo procedure, but I guess I’ll find out.

1 Like

BUMP! Suggestions anyone? @avehtari ?

How do you measure the efficiency? I believe this can be the case, but I’m also interested in learning examples where this would happen.

The first one is likely to provide more stable PSIS-LOO computation.

1 Like

Thanks @avehtari.

Also thanks for inviting me to look into the efficiency matter more thoroughly. I ran my model with and without data augmentation on the same data set (4 chains, 500 warm-up, 500 sampling). As it turns out, the computational efficiency in terms of n_eff per second was very similar between the two approaches. To calculate the efficiency, I took the average of the n_eff of all 120 parameters in my model, and divided that either by the sum of chain times or the net time that stan needed to complete the job:

DATA AUGMENTATION OF CENSORED OBSERVATIONS INTEGRATING OUT CENSORED OBSERVATIONS WITH NORMAL CDF
Average n_eff 1588 1761
chain 1 time 123.00 142.00
chain 2 time 130.00 152.00
chain 3 time 136.00 154.00
chain 4 time 177.00 168.00
net time (running chains in parallel + processing) 188.00 178.00
Average efficiency (n_eff/sec)
Based on sum of chain times 2.81 2.86
Based on net stan time (parellel chains) 8.45 9.90

Interesting note: the approach based on integrating out censored observations was more prone to negative infinity log-prob at initialisation. I think this may have been part of my original motivation to use data augmentation (can’t remember, honestly).

[EDIT: just made the table somewhat prettier]

1 Like

Thanks for the report. Sometimes the efficiency of Stan dynamic HMC is surprising (this is not the only case). Can you also check the number of log density evaluations in each case? That is, is it possible that normal_cdf happens to have that much higher computational cost that reduced number of log density evaluations doesn’t help?

Happy to! How do I extract this from the stanfit object? :D

Use get_num_leapfrog_per_iteration(fit)

See more at Check HMC diagnostics after sampling — check_hmc_diagnostics • rstan

I re-ran both analysis (data augmentation and integrating out with cdf), but now with 8 chains each as I was under the impression that there was quite a lot of variation in computation time between chains. Turns out that my previous run of four chains each was a bit of a lucky hit. Integrating out seems to be more efficient after all.

DATA AUGMENTATION OF CENSORED OBSERVATIONS INTEGRATING OUT CENSORED OBSERVATIONS WITH NORMAL CDF
Average n_eff 3138 3414
chain 1 time 182.00 153.00
chain 2 time 187.00 164.00
chain 3 time 191.00 167.00
chain 4 time 200.00 173.00
chain 5 time 209.00 175.00
chain 6 time 218.00 178.00
chain 7 time 239.00 188.00
chain 8 time 267.00 205.00
net time (running chains in parallel + processing) 278.00 217.00
Average efficiency (n_eff/sec)
Based on sum of chain times 1.85 2.43
Based on net stan time (parellel chains) 11.29 15.73
Number of leapfrog steps per iteration (freq distr)
63 1675 2945
127 2314 1053
191 8 2
255 2 0

[EDIT: cleaned up table]

The difference is even more pronounced (in favour of integrating out) if the chains are initialised in similar areas of the posterior:

DATA AUGMENTATION OF CENSORED OBSERVATIONS INTEGRATING OUT CENSORED OBSERVATIONS WITH NORMAL CDF
Average n_eff 3070 3485
chain 1 time 188.00 141.00
chain 2 time 195.00 149.00
chain 3 time 197.00 163.00
chain 4 time 205.00 164.00
chain 5 time 209.00 170.00
chain 6 time 213.00 171.00
chain 7 time 215.00 172.00
chain 8 time 234.00 176.00
net time (running chains in parallel + processing) 244.00 188.00
Average efficiency (n_eff/sec)
Based on sum of chain times 1.85 2.67
Based on net stan time (parellel chains) 12.58 18.54
Number of leapfrog steps per iteration (freq distr)
63 2875 3207
127 1118 791
191 5 2
255 2 0

Ah, so the difference you had in the beginning might have been the difference in warmup time? Anyway, the conclusion now is that integrating out is better as we would expect a priori. Integrating out makes the posterior easier to sample which leads to slightly less leapfrog steps per iteration. It is also possible that reduced memory usage improves the running time.

2 Likes