Suppose there are N patients, and patient i has N_i correlated observations. Due to lost to follow-up and other reasons, different patients may have different numbers of observations

In order to calculate WAIC using the loo package, one needs to calculate the elementwise loglikelihood for each datapoint. However, it’s not clear what does ‘each datapoint’ means for me in this situation.

Is it correct to calculate the loglikelihood by patients? If so, how can I deal with the missing values in the patients who are lost to follow-up?

It’s better to think in terms of cross-validation

CV-FAQ How are LOO and WAIC related?

Leave-one-patient-out is fine

CV-FAQ: Can cross-validation be used for hierarchical / multilevel models?

The patients having different number of observations doesn’t invalidate use of leave-one-patient-out approach, but the patients having more observations do have a bigger contribution to the total estimate. This can be what you actually want, so you don’t necessarily need to anything, but if you would like to weight each patient equally you could normalize the patient specific contributions by the number of observations (or integrate over the distributions of the missing observations but that is much more complicated, and probably overkill).

Thank you for your helpful reply. I think it’s difficult to weight each center equally in my case, see the picture I post bellow (a simplified version of my model)

My model is fitted in `cmdstanr(Version 0.5.3)`

and I tried the `loo`

function with the following code:

`fit1 <- rstan::read_stan_csv(fit$output_files()); loo(fit1)`

It warns that some Pareto k diagnostic values are too high. But the moment match method seems not work with `fit1`

.

loo(fit1, moment_match = TRUE)

Error in .local(object, …) :

the model object is not created or not valid

The function `kfold`

doesn’t work either.

kfold(fit1, K=10)

Error in UseMethod(“kfold”) :

no applicable method for ‘kfold’ applied to an object of class “stanfit”

So I tried to refit the model with `rstan (Version 2.26.13)`

, but I got the error that the model does not contain samples. What’s more, it’s too complicated to integrate over the distributions of the missing observations. Do you have more suggestions on `loo`

or `kfold`

?

Yep, it currently requires `rstan`

and doesn’t work with `cmdstanr`

That’s for `rstanarm`

, which knows enough about the models and data that it can automate K-fold-CV. For `rstan`

and `cmdstanr`

you need to provide some information yourself and follow Holdout validation and K-fold cross-validation of Stan programs with the loo package • loo

kfold is the safe choice in this case. see the vignette I mentioned above