Compare two models with real and simulated data for statistical inference

Hi

I run an experiment that includes several conditions. I then made a model and estimated the posterior distribution for each condition.

Now, I have several hypotheses that I would like to test. I simulated a dataset for each hypothesis (based on the number of subjects, conditions, and trials I used in my experiment). I assigned a different mean value for each condition according to my hypotheses. Then, for each simulated dataset (i.e., my hypotheses) I made a Bayesian model (same used for the real data).

Now, I would like to know which model (i.e., my hypotheses) better explains the data. Do you think a “pairwise” model comparison (model with real dataset Vs. model with the hypothesis) will work for testing my hypotheses? In such a case, can I use the loo_compare() command and see which “hypothesis model” is closer to my data, and make some inferences on that?

Best

Hi Ivan, sorry for the late reply… :)

If I understand you correctly you create 1,..,n models, which you then want to compare with LOO to check which one has the best relative out of sample prediction capability? As long as they use the same data and come from the same family, e.g., exponential group of families, you should be able to compare them using LOO.

Please tell me if I’ve misunderstood something :)

1 Like

Hello!

Thanks for the reply. So I have three sets of data. Dataset_original, which is my original sampled data, is composed of three conditions (A, B, C). Dataset 1 and 2 are two simulated datasets where, based on my hypotheses, I assigned different values to the three conditions. For instance, in hypothesis 1 I hypothesized that condition A is higher than B and C and simulated the data accordingly (dataset 1), and my hypothesis 2 is that condition B is higher than C and A, and simulated the data accordingly (dataset 2)
I ran the same model (dep_var ~ predictor + (1 + predictor | subject )) on the 3 datasets (same dependent variable and predictor), the one containing the real data, and the other 2 containing the simulated data based on mine hypotheses. I have now 3 models: model_original, which is the model created on my original sampled data, and models 1 and 2 that are based on the simulated data (datasets 1 and 2 respectively). Now, I would like to know if I can run a pairwise comparison between model_original and model 1 and 2 through the LOO command, and make inferences about the outputs. For instance, I compare model_original with model 1 and model_original with model 2. I see that the out of sample predictions of the first comparison (model_original vs model 1) is 100 and of the second comparison (model_original vs model 2) is 300. Can I infer that my hypothesis 1 is more likely true than hypothesis 2?

If I understand you correctly your models use different datasets? Then it would be hard to compare them using LOO - I can’t understand what the results would mean :)

Many thanks. You confirmed my thoughts. There should be another way to perform a model comparison then. Thanks again!