Computing the probability of treatment A vs B (brms)

Problem setup -
We have 4 vehicles, and 2 oils are applied to 2 vehicles each. Every 15000km, the oil is changed, and the filtration efficiency of a filter is measured, which is the proportion of particulate matter left behind. I have 3 such time points (20,000km, 35,000km and 50,000km). In addition, at each timepoint, each vehicle is measured atleast 3 times (subsampled).

We would like to assess, whether the choice of oil has any effect.

Data setup

library(dplyr)

structure(list(vehicle = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 
4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 
4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), phase_study = c("20000k", 
"20000k", "20000k", "20000k", "20000k", "20000k", "20000k", "20000k", 
"20000k", "20000k", "20000k", "20000k", "35k", "35k", "35k", 
"35k", "35k", "35k", "35k", "35k", "35k", "35k", "35k", "35k", 
"50000k", "50000k", "50000k", "50000k", "50000k", "50000k", "50000k", 
"50000k", "50000k", "50000k", "50000k", "50000k", "50000k", "50000k", 
"50000k", "50000k"), proportion = c(0.133276922067952, 0.205192487353148, 
0.180891132637869, 0.131808859615293, 0.233605003889866, 0.156346934097495, 
0.124846693417874, 0.101009106805783, 0.0960592734051121, 0.0747444592529318, 
0.0806232848484083, 0.0876103330243045, 0.128560106261169, 0.187112431826725, 
0.135358954840921, 0.116530374466731, 0.133765765869908, 0.151286541197553, 
0.149539874266327, 0.147483421748724, 0.138464775367404, 0.136179625528026, 
0.166062526096805, 0.13736338636871, 0.142649125779378, 0.225733072582342, 
0.225752781332766, 0.180084025442341, 0.213935598869078, 0.261067241489387, 
0.139824991219242, 0.189865352171248, 0.207996472478743, 0.156223909638647, 
0.279743362932101, 0.261630166237954, 0.112091188149989, 0.133759402337049, 
0.121998147373204, 0.0849281892292818), subsample_unique = c("1-Total-20000k-used_oil", 
"1-Total-20000k-used_oil", "1-Total-20000k-used_oil", "2-Total-20000k-used_oil", 
"2-Total-20000k-used_oil", "2-Total-20000k-used_oil", "3-Total-20000k-used_oil", 
"3-Total-20000k-used_oil", "3-Total-20000k-used_oil", "4-Total-20000k-used_oil", 
"4-Total-20000k-used_oil", "4-Total-20000k-used_oil", "1-Total-35k-used_oil", 
"1-Total-35k-used_oil", "1-Total-35k-used_oil", "2-Total-35k-used_oil", 
"2-Total-35k-used_oil", "2-Total-35k-used_oil", "3-Total-35k-used_oil", 
"3-Total-35k-used_oil", "3-Total-35k-used_oil", "4-Total-35k-used_oil", 
"4-Total-35k-used_oil", "4-Total-35k-used_oil", "1-Total-50000k-used_oil", 
"1-Total-50000k-used_oil", "1-Total-50000k-used_oil", "1-Total-50000k-used_oil", 
"1-Total-50000k-used_oil", "1-Total-50000k-used_oil", "2-Total-50000k-used_oil", 
"2-Total-50000k-used_oil", "2-Total-50000k-used_oil", "3-Total-50000k-used_oil", 
"3-Total-50000k-used_oil", "3-Total-50000k-used_oil", "3-Total-50000k-used_oil", 
"4-Total-50000k-used_oil", "4-Total-50000k-used_oil", "4-Total-50000k-used_oil"
), oil = c("B", "B", "B", "B", "B", "B", "A", "A", "A", "A", 
"A", "A", "B", "B", "B", "B", "B", "B", "A", "A", "A", "A", "A", 
"A", "B", "B", "B", "B", "B", "B", "B", "B", "B", "A", "A", "A", 
"A", "A", "A", "A")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -40L))

Model setup
A varying intercept model for vehicle and for the subsamples with a fixed effect for oil.

library(brms)

full_model_prior <- 
  brm(proportion ~ ol + (1|vehicle) + (1|subsample_unique) , data = data, 
      prior = c(prior(student_t(5, 0.25, .1), class = Intercept),
                prior(normal(0,.1), class = b), 
                prior(exponential(2.5), class = sigma),
                prior(exponential(2.5), class = sd)), sample_prior = "only")

I played around with prior predictive distribution and constrained the priors such that the plausible response generated from the priors is [0,1]. This is because we are working with proportions and negative proportions don’t make sense, and proportions >1 are implausible because particulate matter cannot increase after filtration.
please critique my priors - still learning

Posterior Model

full_model_posterior <- 
  brm(proportion ~ ol + (1|vehicle) + (1|subsample_unique) , data = data, 
      prior = c(prior(student_t(5, 0.25, .1), class = Intercept),
                prior(normal(0,.1), class = b), 
                prior(exponential(2.5), class = sigma),
                prior(exponential(2.5), class = sd)), control = list(adapt_delta = 0.99999), chains = 4, cores = 4, iter = 12000)

My posterior checks look OK I think. Given the small sample size, I can’t expect too much.

Now I’m interested in making a comparison in the posterior distribution of Oil A - Oil B. Since I also fit an intercept, I get a posterior distribution of B - A, the difference.

Can I make a statement, such as, the probability that Oil B has a higher proportion of particulate leftover as compared to B is xxx, using the posterior as shown below?

brms::posterior_samples(full_model_posterior, pars = "b_oilB") %>%
  dplyr::filter(b_oilB>0) %>%
  dplyr::summarise(probability = nrow(.)/24000)

## I take the posterior dsitribution of the difference, and calculate how many are >0 and calculate a proportion (24000 is the sampled values).  

The result of the above is 0.82 - which I think is interesting.
However, if you look at the 80% quantile interval, it ranges from [-0.0187,0.0778], which includes zero.

posterior_interval(full_model_posterior, prob = 0.80)

How do I then interpret this result? Is it valid for me to calculate the probability of Oil B > Oil A using the posterior above? Or is the uncertainity interval a more meaningful measure to interpret the treatment effect?

You can verify your above probability computing procedure by using the hypothesis method of brms, which does this automatically for you. In general, the CI is two sided while the you A > B comparision is one-sided, so the two results don’t necessarily contradict each other.

Thank you, Paul!