Hey
I’m new to multilevel modelling (and stan), so these are (hopefully) basic questions.
If this is not the right forum for these questions, feel free to point to a more fitting place. :)
I allready posted a similar question here, but I think I was making it too complex so I try to clear up some more basic things first :)
The data:
I have a S subjects that each gets 3 (T) treatments (A, B, C).
The responsiveness of the subject to a treament is measured.
There are several measuments per subject.
Subject will have a different baselevel of response independent of treatment.
(Subject 1 will have a different response to treatment A compared to subject 2, even if treamtent A has no effect on both)
Most subjects probably show no difference in response to the 3 treatments.
If a subject respond, it probably has a different response to the 3 treatments.
The scientific question:
I’m interested which subject show a differential response and for which pair of treatments.
So this is a mulltiple comparison problem.
The multilevel model:
After searching for examples my first take was having a random effect on the subject-treatment interaction y \tilde{} subject_s + (1 | subject_s:treatment_t)
with subject:treatment \tilde{} N(0, \sigma_s)
So a \sigma for each subject.
My questions:
Does this model look reasonable for my problem?
I understood from Gelman,2012 that multilevel modelling takes care of the multi comparison problem.
Is this assumption valid in my case since I have diferent \sigma_s for different subjects?
I was wondering what the implications would be of 1 shared \sigma for all subjects instead of a \sigma_s for each subject?
I was considering it because I was wondering if I’m not neglecting variance information from the other subject by choosing a different \sigma_s for each subject.
Would a more complex model be more appropriate? I could model subject_s also as a random effect. y \tilde{} (1|subject_s) + (1 | subject_s:treatment_t)
or maybe even y \tilde{} (1|subject_s) + (1|treatment_s) + (1 | subject_s:treatment_t)
Are there caveats by doing this?
For inference, I understood that shrinkage takes care of the multi comparison ( Gelman,2012).
Does this mean I can just take the difference of posterior samples for each treatment pair in each subject. (eg treatment B - treatment A), calculate the 95% high density interval (HDI) and for a given treatment pair report the subjects were zero is excluded in the 95% HDI? (I’m used to calculating 5% FDR levels in frequentistic statistics)
allows to model you variation in treatment effects and also models variation in subjects’ propensity to respond to any treatment.
I am not sure I understand why \sigma should vary between subjects. Letting it vary by treatment appears more reasonable (provided there is a good reason to assume that treatment has an effect on the residual variance).
You can check if the HDIs of the treatment effects overlap with zero without correcting for multiple comparisons. Alternatively one can discuss if a treatment effect is so strong that it is practically meaningful.
Thanks for your answer,
So if I’m correct this woud correspond to having a \sigma for each treatment?
Since I’m intersted in difference in treatment effect (eg. treatment B -treatment A \ne 0), would
separate shrinkage of the treatment effect make my inference still valid (also in the light of multiple comparisons)?
The reason I wanted a seperate shrinkage (\sigma per subject) is that the within variance of subjects can vary a lot.
Subjects wil respond to none of the treatments or to all treatments.
I was hoping to catch this with a \sigma per subject, but on the other hand. the sample size might be to small to model this variance. (due to lots of missingness)
Subjects wil respond to none of the treatments or to all treatments, thats why I thought a \sigma per subject might be a good idea. On the other hand the samplesize per subject might me to small due to missing data.
I think your suggested model will not catch different treatments effects per subject. These will certainly be present.
Sorry, I shouldn’t write posts in the middle of the night.
What I wrote is rubbish (one cannot use treatment as a predictor and as a variable to condition on).
And what if the baseline value if the variable you condition is different between groups.
An example:
If I take the suggested model of @paul.buerkner y ~ treatment + (treatment | subject)
In my case each subject base level is different independent of treatment.
But some subjects will have an equal response between treatments, some others will have a different response between treatments.
Would the following not be more appropriate? y ~ treatment + subject + (treatment | subject)
If I understand correctly with (treatment|subject) I model the variance betweens subjects within a certain treatment group.
However part of this variance is not due to the treament effect but due a treament independent subject effect.
This subject effect will be equal across all treaments.
That’s why in my mind it looked natural to offset this wit a subject specific parameter.
Please, can you show me where my reasoning is flawed?
Did I interpret the (treatment|subject) term wrong?
Yes, I tried to fit my data wit both brms and stanarm.
fit = brm(y ~ treatment + (treatment | subject),data)
I noticed that brm was significantly faster then stanarm to fit my data.
I’m sorry, If this discussion becomes to much about theory.
It thought it was ok under the ‘general’ topic.
If this is not the right forum, please say so. Or point me to a more suitable place.
I see. Let me try to make clear what the model I suggested does. When you go for y ~ treatment + (treatment | subject), you assume the baseline of y (the intercept) to vary across subjects as well as the difference between the reference treatment and the other treatments (if dummy coding is used).
If you use y ~ 0 + treatment + (0 + treatment | subject) you leave out the intercept and instead model all three treatments directly. Both models are equivalent but represent different parameterizations.
The treatment independent dependency of observations of the same subject you mentioned above is modeled in both of the above formulations.
The packages code these models differently internally (for some models rstanarm sacrifices a bit of sampling speed in order to be able to write the model in a way amenable to pre-compilation) so you could definitely see differences in timing. Another big difference in timing can come from the fact that rstanarm uses very conservative default settings for HMC/NUTS (e.g. adapt_delta defaults to at least 0.95) to reduce the number of models that have divergence warnings when fit with default settings. If you have no issues with divergences then setting adapt_delta=0.8 (the default for RStan) should make it run quite a bit faster.