Hey
I’m new to multilevel modelling (and stan), so these are (hopefully) basic questions.
If this is not the right forum for these questions, feel free to point to a more fitting place. :)
I allready posted a similar question here, but I think I was making it too complex so I try to clear up some more basic things first :)
The data:
I have a S subjects that each gets 3 (T) treatments (A, B, C).
The responsiveness of the subject to a treament is measured.
There are several measuments per subject.
Subject will have a different baselevel of response independent of treatment.
(Subject 1 will have a different response to treatment A compared to subject 2, even if treamtent A has no effect on both)
Most subjects probably show no difference in response to the 3 treatments.
If a subject respond, it probably has a different response to the 3 treatments.
The scientific question:
I’m interested which subject show a differential response and for which pair of treatments.
So this is a mulltiple comparison problem.
The multilevel model:
After searching for examples my first take was having a random effect on the subjecttreatment interaction
y \tilde{} subject_s + (1  subject_s:treatment_t)
with subject:treatment \tilde{} N(0, \sigma_s)
So a \sigma for each subject.
My questions:

Does this model look reasonable for my problem?
I understood from Gelman,2012 that multilevel modelling takes care of the multi comparison problem.
Is this assumption valid in my case since I have diferent \sigma_s for different subjects? 
I was wondering what the implications would be of 1 shared \sigma for all subjects instead of a \sigma_s for each subject?
I was considering it because I was wondering if I’m not neglecting variance information from the other subject by choosing a different \sigma_s for each subject. 
Would a more complex model be more appropriate? I could model subject_s also as a random effect.
y \tilde{} (1subject_s) + (1  subject_s:treatment_t)
or maybe even
y \tilde{} (1subject_s) + (1treatment_s) + (1  subject_s:treatment_t)
Are there caveats by doing this? 
For inference, I understood that shrinkage takes care of the multi comparison ( Gelman,2012).
Does this mean I can just take the difference of posterior samples for each treatment pair in each subject. (eg treatment B  treatment A), calculate the 95% high density interval (HDI) and for a given treatment pair report the subjects were zero is excluded in the 95% HDI? (I’m used to calculating 5% FDR levels in frequentistic statistics)
Thanks in advance
Greets