Handling Unavailable Interaction Combinations

I am trying to find an appropriate single model to estimate the quality ratings of images for an experiment I will be running. In a pilot, I have shown images either in their original state or with “jpeg”, “blur” or “contrast” distortion.

If the images are distorted they can be distorted at levels “70”, "80 or “90” which represents the percentage of the original quality they are expected to have. I wish to create a BRMS model to model the interactions between distortions and levels and compare the reduction in quality to the original images. My first attempt for this is:

model_interaction <- brm(rating ~  level * distortion + (1 | participant), data = subset_data,
                         chains = 6, cores = 6, iter = 8000, warmup = 4000, thin = 2,
                         control = list(adapt_delta = 0.9),
                         family = cumulative(threshold = "flexible"))

One issue with this attempt is that it assumes that all distortion * level options are possible, but none will always have a quality of 100. The above specification will for instance try to find out what quality none will have a 80 or vice versa what quality blur will be at 100. Both of these combinations are unavailable in my dataset and in fact theoretically impossible.

Instead, the ideal model should have these options:

I currently see two solutions. One is to create separate new interaction levels

interaction(subset_data$distortion, subset_data$level, drop = TRUE)
Levels: none.100 jpeg.90 blur.90 contrast.90 jpeg.80 blur.80 contrast.80 jpeg.70 blur.70 contrast.70

This approach would not estimate impossible combinations but it would also not understand that jpeg.90 and jpeg.80 are related. As such when I model my pilot data this way, I get a model that appears worse than my null model according to loo.

The best solution, I see is to subset the data to not include the original images. Then the above model can appropriately model the interaction of level and distortion:

I don’t know if I could then create a separate model for the original data alone and subtract the posterior draws from the previous model thereby getting how much a reduction is expected. I think it would be possible but probably unecessarily complex to create several models if this at all could be done with one.

Does anyone have any other ideas for what I could do? Thanks for reading and let me know if you need additional information.

Level is nested in Distortion. Useful techniques for dealing with nested variables are provided here: How do you deal with "nested" variables in a regression model? - Cross Validated.

I think the following formula should do the trick (but please check!):

rating ~ distortion + distortion : level + (1 | participant)

More generally, you can use the get_prior() function to determine the names of all coefficients in the model and, if any of them are meaningless/impossible, they can be fixed to exactly 0 using set_prior(constant(0), class = "b", coef = "your_impossible_coefficient").


Thank you for your response. When I try to run your proposed model, it has a hard time converging and still seems to estimate the impossible combinations.

brm(rating ~ distortion + distortion:level + (1 | participant), data = subset_datafull,
+                          chains = 2, cores = 2, iter = 4000, warmup = 2000, thin = 1,
+                          control = list(adapt_delta = 0.9),
+                          family = cumulative(threshold = "flexible"))
Compiling Stan program...
Start sampling
Warning messages:
1: There were 4000 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
2: Examine the pairs() plot to diagnose sampling problems
3: The largest R-hat is NA, indicating chains have not mixed.
Running the chains for more iterations may help. See
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See

We can see this clearly on the conditional_effects plot:

I will try to look at specifying priors at 0 and reread your links once again. In the mean time other ideas and input is very welcome.

Thanks again for reading.