I have a large-ish number (~400) of posterior distributions for some regression coefficients, and I’d like to make a claim about the sign of these coefficients while keeping my probability of making a Type-S error at some level (hey, how about 5%).
Using brms syntax for expediency, it’s something like this:
bf( y_m ~ 1 + x + (1 | id/session) )
where m
is 1:400
. My interest is in the sign of the coefficient for x
which varies over id:session
.
At the moment, this is what I’m thinking of doing. For each coefficient:
- Compute the proportion of the posterior that has the same sign as the median.
- Compute the cumulative product from largest to smallest.
- Make a claim about the sign of those coefficients where this cumulative product is >.95 (i.e., the joint probability of them all being of the sign I’m claiming is greater than .95).
This seems to me like an efficient way of controlling the family-wise Type-S error rateprobability.
So my first question is just whether this makes sense; however, I suppose it’s not something that would ever be useful in practice. I’ll lay out the actual application below.
The application
I am also aware that this might be sub-optimal. Why not put all the data into a hierarchical model? In this case, the data is pretty big (30M rows or so). However, there aren’t any variables at the lowest level of nesting, so I could summarize the outcome for each of the 400 variables using the mean and standard error of the mean, and then use those as my outcome in a model that also nests the data under M
(specifically, each m
is a brain region in which multiple voxels are observed). So the above model would become:
bf( y | se(y_se, sigma = TRUE) ~ 1 + x + (1 | id/session) + (1 + x | M) )
With a regularizing prior on the variance of x
across M
I think I could just look at the posterior for each x_m
and use the 95% credible interval to make decisions about the sign.
The context
This is in fMRI research, where we really like to say that some regions of the brain are associated with this or that, but at the moment we don’t find the magnitude of these effects very meaningful. It’s also tough to produce a ROPE, though this might ultimately be the most concrete and defensible way to make these kinds of decisions. Anyway, this is why I want to be able to make a claim about the sign couched in some probability of error, and not just show all of these as posterior distributions.
Thanks very much for helping me think about this!