Using generate_quantities to recover loo output

Hello,

I was wondering if the output of loo could be recovered later, by appending a generated quantities block. For example, assume we have a baseline stan model (“model_baseline.stan”) that does NOT contain a GQ block. Another model (“model_GQ.stan”) is identical to the baseline but with a GQ block that generates a vector log_lik of pointwise likelihoods.

loo output in 1 step:
model1 ← cmdstan_model(‘model_GQ.stan’) # contains GQ block with log_lik
fit1 ← model1$sample(data, seed, chains, save_warmup = TRUE, …) # output files contain log_lik
loo1 ← loo(fit1$draws(“log_lik”)) # get loo output

Now, I wish to split this process into 2 steps:
model2 ← cmdstan_model(‘model_baseline.stan’) # no GQ block
fit2 ← model2$sample(data, seed, chains, save_warmup = TRUE, …) # output files do NOT contain log_lik
fit2loo ← model1$generate_quantities(fit2, data, seed) # generate log_lik values later
loo2 ← loo(fit2loo$draws(“log_lik”)) # get loo output later

The problem is that loo2 does not seem to match with loo1. Is there something I’m missing? I have verified that the log_lik values generated are the same in both cases. Yet, the loo output is very different.

I wish to split the process because the single-step approach seems to take too long. I’d rather compute the posterior first and analyse it while log_lik array is being generated for loo criterion.

Edit: The problem could be because fit1 seems to be a CmdStanMCMC object while fit2 seems to be a CmdStanGQ object.

But the $draws() should return draws object in both cases.

And the draws objects fit1$draws(“log_lik”) and fit2loo$draws(“log_lik”) have the same values and structure?

Can you show the output, too?

Hello, thank you for the response. This is how the two $draws() objects look like.

Fit 1:

# A draws_array: 1000 iterations, 4 chains, and 720 variables
, , variable = log_lik[1]

         chain
iteration    1    2    3    4
        1 -4.3 -3.8 -4.2 -4.0
        2 -4.3 -4.1 -4.1 -4.3
        3 -4.0 -4.1 -4.0 -4.0
        4 -4.1 -4.4 -4.2 -4.0
        5 -4.3 -4.4 -4.1 -4.6

, , variable = log_lik[2]

         chain
iteration    1    2    3    4
        1 -5.2 -4.2 -4.8 -4.3
        2 -4.2 -4.4 -4.4 -4.3
        3 -4.6 -4.8 -4.5 -5.3
        4 -4.6 -4.9 -5.0 -3.9
        5 -4.8 -4.7 -4.3 -5.8

, , variable = log_lik[3]

         chain
iteration    1    2    3    4
        1 -4.5 -3.9 -4.1 -4.0
        2 -4.1 -4.1 -4.2 -4.2
        3 -4.1 -4.2 -4.0 -4.9
        4 -4.1 -4.3 -4.3 -3.9
        5 -4.3 -4.3 -4.1 -4.9

, , variable = log_lik[4]

         chain
iteration    1    2    3    4
        1 -4.2 -3.8 -4.3 -4.0
        2 -4.2 -4.2 -4.1 -4.2
        3 -4.0 -4.1 -4.0 -4.5
        4 -4.1 -4.4 -4.1 -4.0
        5 -4.3 -4.5 -4.1 -4.4

# ... with 995 more iterations, and 716 more variables

Fit 2:

# A draws_array: 2000 iterations, 4 chains, and 720 variables
, , variable = log_lik[1]

         chain
iteration    1    2  3    4
        1 -6.8 -7.8 -6 -7.0
        2 -6.8 -7.8 -6 -7.0
        3 -6.8 -7.8 -6 -7.0
        4 -6.0 -7.7 -6 -7.0
        5 -4.5 -5.4 -6 -6.3

, , variable = log_lik[2]

         chain
iteration    1    2  3    4
        1 -6.8 -7.8 -6 -7.0
        2 -6.8 -7.8 -6 -7.0
        3 -6.8 -7.8 -6 -7.0
        4 -6.0 -7.7 -6 -7.0
        5 -3.5 -5.4 -6 -6.3

, , variable = log_lik[3]

         chain
iteration    1    2    3    4
        1 -6.8 -7.8 -6.0 -7.0
        2 -6.8 -7.8 -6.0 -7.0
        3 -6.8 -7.8 -6.0 -7.0
        4 -6.0 -7.7 -6.0 -7.0
        5 -3.7 -5.4 -6.1 -6.3

, , variable = log_lik[4]

         chain
iteration    1    2    3    4
        1 -6.8 -7.8 -6.0 -7.0
        2 -6.8 -7.8 -6.0 -7.0
        3 -6.8 -7.8 -6.0 -7.0
        4 -6.0 -7.7 -6.0 -7.0
        5 -3.4 -5.4 -6.1 -6.3

# ... with 1995 more iterations, and 716 more variables

The difference in iterations is just because the latter includes warmup. I have verified that all summary statistics (median, mean, quantitles, max, min) of the log_lik vectors have minor differences despite differences in iteratons.

Here are the loo outputs:

Fit 1

Computed from 4000 by 720 log-likelihood matrix

         Estimate   SE
elpd_loo  -2946.3 25.7
p_loo       166.5  9.4
looic      5892.6 51.3
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     609   84.6%   586       
 (0.5, 0.7]   (ok)        70    9.7%   196       
   (0.7, 1]   (bad)       38    5.3%   24        
   (1, Inf)   (very bad)   3    0.4%   7         
See help('pareto-k-diagnostic') for details.

Fit 2:

Computed from 8000 by 720 log-likelihood matrix

         Estimate   SE
elpd_loo  -2974.4 25.4
p_loo       192.8  9.8
looic      5948.7 50.7
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     105   14.6%   1614      
 (0.5, 0.7]   (ok)       184   25.6%   338       
   (0.7, 1]   (bad)      292   40.6%   23        
   (1, Inf)   (very bad) 139   19.3%   12        
See help('pareto-k-diagnostic') for details.

Edit: Is there a way to make sure Fit 2 only uses post-warmup draws? It doesn’t seem possible to subset these objects.

I just learnt that there seems to be a way to subset the object. Subset draws objects — subset_draws • posterior

I can confirm that the following lines of code yield exactly the same loo output (I have 2000 iterations with 1000 warm-up, so I’m taking iterations 1001:2000 in the second part):

loo output in 1 step:

model1 ← cmdstan_model(‘model_GQ.stan’) # contains GQ block with log_lik
fit1 ← model1$sample(data, seed, chains, save_warmup = TRUE, …) # output files contain log_lik
loo1 ← loo(fit1$draws(“log_lik”)) # get loo output

loo output in 2 steps:

model2 ← cmdstan_model(‘model_baseline.stan’) # no GQ block
fit2 ← model2$sample(data, seed, chains, save_warmup = TRUE, …) # output files do NOT contain log_lik
fit2loo ← model1$generate_quantities(fit2, data, seed) # generate log_lik values later
loo2 ← loo(subset(fit2loo$draws(“log_lik”), iteration = 1001:2000)) # get loo output later
1 Like

Great you solved it!

How did you end up saving warmup draws, as your code snippet doesn’t show that you would have explicitely asked for that?

1 Like

Right! I showed the other arguments (data, seed, chains); I didn’t think saving warmup draws was the cause of the discrepancy. I have edited both my posts above accordingly.

It is surprising that the loo outputs are drastically different if warmup iterations are included (despite other summary statistics being the same). Is this due to the same reason we discard warmup iterations- the effect of initialisation?

I was exploring a little further and it seems discarding even the first 20 out of 2000 (i.e. using 21:2000) iterations is sufficient to get a stable loo output that is close to the single-step estimate. It is only when I start including the first 20 iterations that the loo output gets drastically worse. I guess this depends on the model but in case of my model (which is a basic expected utility model), I am assuming it means that convergence is achieved fairly early, even in < 100 iterations.

Yes.

This can happen for easy posteriors, but any warmup draws should not be included as there is a danger of having non-neglible bias. There have been some proposals and experiments for adaptive warmup length, but it’s difficult to make them as safe as the current fixed approach and the same time achieving big saving in time. (Sorry, I run out of time to elaborate on this further)

1 Like