Computing the probability of treatment A vs B (brms)

dilsherdhillon · July 19, 2019, 12:54am

Problem setup -
We have 4 vehicles, and 2 oils are applied to 2 vehicles each. Every 15000km, the oil is changed, and the filtration efficiency of a filter is measured, which is the proportion of particulate matter left behind. I have 3 such time points (20,000km, 35,000km and 50,000km). In addition, at each timepoint, each vehicle is measured atleast 3 times (subsampled).

We would like to assess, whether the choice of oil has any effect.

Data setup

library(dplyr)

structure(list(vehicle = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 4L, 4L, 4L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 
4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 
4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), phase_study = c("20000k", 
"20000k", "20000k", "20000k", "20000k", "20000k", "20000k", "20000k", 
"20000k", "20000k", "20000k", "20000k", "35k", "35k", "35k", 
"35k", "35k", "35k", "35k", "35k", "35k", "35k", "35k", "35k", 
"50000k", "50000k", "50000k", "50000k", "50000k", "50000k", "50000k", 
"50000k", "50000k", "50000k", "50000k", "50000k", "50000k", "50000k", 
"50000k", "50000k"), proportion = c(0.133276922067952, 0.205192487353148, 
0.180891132637869, 0.131808859615293, 0.233605003889866, 0.156346934097495, 
0.124846693417874, 0.101009106805783, 0.0960592734051121, 0.0747444592529318, 
0.0806232848484083, 0.0876103330243045, 0.128560106261169, 0.187112431826725, 
0.135358954840921, 0.116530374466731, 0.133765765869908, 0.151286541197553, 
0.149539874266327, 0.147483421748724, 0.138464775367404, 0.136179625528026, 
0.166062526096805, 0.13736338636871, 0.142649125779378, 0.225733072582342, 
0.225752781332766, 0.180084025442341, 0.213935598869078, 0.261067241489387, 
0.139824991219242, 0.189865352171248, 0.207996472478743, 0.156223909638647, 
0.279743362932101, 0.261630166237954, 0.112091188149989, 0.133759402337049, 
0.121998147373204, 0.0849281892292818), subsample_unique = c("1-Total-20000k-used_oil", 
"1-Total-20000k-used_oil", "1-Total-20000k-used_oil", "2-Total-20000k-used_oil", 
"2-Total-20000k-used_oil", "2-Total-20000k-used_oil", "3-Total-20000k-used_oil", 
"3-Total-20000k-used_oil", "3-Total-20000k-used_oil", "4-Total-20000k-used_oil", 
"4-Total-20000k-used_oil", "4-Total-20000k-used_oil", "1-Total-35k-used_oil", 
"1-Total-35k-used_oil", "1-Total-35k-used_oil", "2-Total-35k-used_oil", 
"2-Total-35k-used_oil", "2-Total-35k-used_oil", "3-Total-35k-used_oil", 
"3-Total-35k-used_oil", "3-Total-35k-used_oil", "4-Total-35k-used_oil", 
"4-Total-35k-used_oil", "4-Total-35k-used_oil", "1-Total-50000k-used_oil", 
"1-Total-50000k-used_oil", "1-Total-50000k-used_oil", "1-Total-50000k-used_oil", 
"1-Total-50000k-used_oil", "1-Total-50000k-used_oil", "2-Total-50000k-used_oil", 
"2-Total-50000k-used_oil", "2-Total-50000k-used_oil", "3-Total-50000k-used_oil", 
"3-Total-50000k-used_oil", "3-Total-50000k-used_oil", "3-Total-50000k-used_oil", 
"4-Total-50000k-used_oil", "4-Total-50000k-used_oil", "4-Total-50000k-used_oil"
), oil = c("B", "B", "B", "B", "B", "B", "A", "A", "A", "A", 
"A", "A", "B", "B", "B", "B", "B", "B", "A", "A", "A", "A", "A", 
"A", "B", "B", "B", "B", "B", "B", "B", "B", "B", "A", "A", "A", 
"A", "A", "A", "A")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -40L))

Model setup
A varying intercept model for vehicle and for the subsamples with a fixed effect for oil.

library(brms)

full_model_prior <- 
  brm(proportion ~ ol + (1|vehicle) + (1|subsample_unique) , data = data, 
      prior = c(prior(student_t(5, 0.25, .1), class = Intercept),
                prior(normal(0,.1), class = b), 
                prior(exponential(2.5), class = sigma),
                prior(exponential(2.5), class = sd)), sample_prior = "only")

I played around with prior predictive distribution and constrained the priors such that the plausible response generated from the priors is [0,1]. This is because we are working with proportions and negative proportions don’t make sense, and proportions >1 are implausible because particulate matter cannot increase after filtration.
please critique my priors - still learning

Posterior Model

full_model_posterior <- 
  brm(proportion ~ ol + (1|vehicle) + (1|subsample_unique) , data = data, 
      prior = c(prior(student_t(5, 0.25, .1), class = Intercept),
                prior(normal(0,.1), class = b), 
                prior(exponential(2.5), class = sigma),
                prior(exponential(2.5), class = sd)), control = list(adapt_delta = 0.99999), chains = 4, cores = 4, iter = 12000)

My posterior checks look OK I think. Given the small sample size, I can’t expect too much.

Now I’m interested in making a comparison in the posterior distribution of Oil A - Oil B. Since I also fit an intercept, I get a posterior distribution of B - A, the difference.

Can I make a statement, such as, the probability that Oil B has a higher proportion of particulate leftover as compared to B is xxx, using the posterior as shown below?

brms::posterior_samples(full_model_posterior, pars = "b_oilB") %>%
  dplyr::filter(b_oilB>0) %>%
  dplyr::summarise(probability = nrow(.)/24000)

## I take the posterior dsitribution of the difference, and calculate how many are >0 and calculate a proportion (24000 is the sampled values).

The result of the above is 0.82 - which I think is interesting.
However, if you look at the 80% quantile interval, it ranges from [-0.0187,0.0778], which includes zero.

posterior_interval(full_model_posterior, prob = 0.80)

How do I then interpret this result? Is it valid for me to calculate the probability of Oil B > Oil A using the posterior above? Or is the uncertainity interval a more meaningful measure to interpret the treatment effect?

paul.buerkner · July 26, 2019, 10:23am

You can verify your above probability computing procedure by using the hypothesis method of brms, which does this automatically for you. In general, the CI is two sided while the you A > B comparision is one-sided, so the two results don’t necessarily contradict each other.

dilsherdhillon · July 29, 2019, 3:47pm

Thank you, Paul!

Topic		Replies	Views
Subject-specific vs population-averaged effects in brms longitudinal bernoulli regression model brms fitting-issues	17	1379	May 26, 2020
Brms hypothesis function with category specific effects Modeling ordinal-response , brms	1	988	July 10, 2021
Brms: how can I compare differences of variance of marginal_effects() for categories (i.e. category probabilities) brms techniques	4	537	September 1, 2020
Interpretation of conditional effects in brms brms logistic-regression , interpret-results , conditional-effects	2	4342	May 30, 2020
Conditional effects and summary estimate showing opposite results General conditional-effects , brms	8	1389	July 12, 2022

Computing the probability of treatment A vs B (brms)

Related topics