Hi. I have an experiment where people rate a bunch of images. Some of these images have been run through filters (think Instagram) and I have found that some observers are particularly bothered by some types of filters. To replicate this effect, I created a synthetic dataset and ran the following model:
brm(rating ~ filter + (filter|id), data = results, family = cumulative(threshold = "flexible"), chains = 2, cores = 2, iter=3000)
In this example, the first 8 observers are particularly bothered by the “Bad filter” whereas the following 8 observers are equally bothered by the bad and medium filter. In reality, there will be more than just these two patterns in my data.
This is already an interesting result to me, but I now have the question of whether observers are robust in their ratings. So if I invited the same observers to a new session later are the pattern in their ratings then similar to the current or not? For example, will ID 1 again be particularly bothered by the bad filter?
I am not quite sure of how to proceed if I collected such data. Should I, for instance, model the two experiments separately and then subtract the posteriors for each individual? Should I somehow create one model that includes both experiments? I have also considered if I should use some kind of stratified cross-validation as introduced here by Aki Cross-validation for hierarchical models
It would be great if I somehow end up with a procedure that could both show if the ratings are consistent per observer or if they fluctuate.
I have written the code to simulate an experiment 1 and 2 here. It also includes the model I have used and the code to generate the plot attached.
Thanks for reading so far. I would really appreciate input on how to proceed from here.