Model comparison (two models ran on two different N sizes)

I fitted two models on the same dataset but different N sizes. How I handle missing observations explains different N. Now I want to see which model is better. I consulted Chatgpt and I got this response.

“Pareto Smoothed Importance Sampling (PSIS): PSIS is a method used to estimate the out-of-sample predictive performance of a model. It addresses the issue of comparing models with different sample sizes by adjusting for the different levels of uncertainty introduced by the smaller dataset. You can calculate PSIS-LOO (leave-one-out) or PSIS-WAIC (Widely Applicable Information Criterion) to compare the models”.

But when I checked from the loo package, It seems what Chatgpt is saying is not true. But I’m not so sure. Can you please confirm if what Chatgpt is saying is correct. Please also give any suggestions besides multiple imputation about how I can compare these models.

Thank you.

Your question doesn’t have enough information in it to understand how you might go about comparing these models. However, if we can define the prediction task that you want to evaluate using leave-one-out CV as restricted to the shared (nonmissing) data common to the two approaches, and if both approaches admit a factorization such that the point-wise likelihood is well defined, then you can use the log-likelihood matrix over the shared observations to evaluate and compare the two models’ leave-one-out predictive performance over the shared data via PSIS LOO.
Chat GPT is wrong if it’s talking about the sample sizes corresponding to the dataset used to fit the model, but is basically correct if it’s talking about the MCMC sample sizes from the posterior distribution.

Thank you for your help. I’m new in Bayesian and this is my first work as a masters student. I don’t have a solid background in Mathematics and Statistics as well but i’m willing to build now.

Here are my two models. fit_1 is my main model and has all the predictors. However, it has few groups (CLASS) because information about DNA_Dynamut_deltaG, BLOSUM, LIG_Dynamut_deltaG, DNA_PPI_RSA, DNA_mCSM_Stability_deltaG is missing for the other groups. Therefore, small sample size.

My second model (fit_2) has all the groups (CLASS). However, this model has few predictors (CONSURF, dimerization_affected, DNA_binding_affected). We have complete information for all the groups but only for these three predictors. Therefore, large sample size compared to the previous model.

My problem now is to conclude whether or not a model with only a few predictors but more groups is good compared to the model with all the predictors but few groups (fit_1). I want to judge the performance in terms of goodness of fit and out of sample predictive performance. For goodness of fit, I used the Average Bayesian posterior predictive p-value and there was not much difference between these two models. For out of sample predictive performance, I wanted to use LOOIC and use loo.compare but it did not work because of different N.

If you have suggestions please guide me.

fit_1 <- brm(data = model2,
      family = binomial(link = "logit"), pDST_Resistance|trials(Number_mutation) ~ CONSURF + BLOSUM
+ DNA_binding_affected + DNA_Dynamut_deltaG + LIG_Dynamut_deltaG + DNA_PPI_RSA + dimerization_affected + DNA_mCSM_Stability_deltaG + (1 |CLASS))

fit_2 <- brm(data = model2,
      family = binomial(link = "logit"), pDST_Resistance|trials(Number_mutation) ~ CONSURF 
+ DNA_binding_affected + dimerization_affected  + (1 |CLASS))

If desired, you can compare the leave-one-out predictive performance over just the shared portion of the data. There is no good way to evaluate the out-of-sample predictive performance over the data for which you do not know the covariate values in model2. To compare over the shared data, you will use the log-likelihood matrices just for the shared datapoints. Using brms::loo.brmsfit, you can achieve this by passing the shared data to the newdata argument.

1 Like

Makes sense. Thank you so much. I really appreciate.