Computing the Type-S error rate over multiple variables

I have a large-ish number (~400) of posterior distributions for some regression coefficients, and I’d like to make a claim about the sign of these coefficients while keeping my probability of making a Type-S error at some level (hey, how about 5%).

Using brms syntax for expediency, it’s something like this:

bf( y_m ~ 1 + x + (1  | id/session) ) 

where m is 1:400. My interest is in the sign of the coefficient for x which varies over id:session.

At the moment, this is what I’m thinking of doing. For each coefficient:

  1. Compute the proportion of the posterior that has the same sign as the median.
  2. Compute the cumulative product from largest to smallest.
  3. Make a claim about the sign of those coefficients where this cumulative product is >.95 (i.e., the joint probability of them all being of the sign I’m claiming is greater than .95).

This seems to me like an efficient way of controlling the family-wise Type-S error rateprobability.

So my first question is just whether this makes sense; however, I suppose it’s not something that would ever be useful in practice. I’ll lay out the actual application below.


The application

I am also aware that this might be sub-optimal. Why not put all the data into a hierarchical model? In this case, the data is pretty big (30M rows or so). However, there aren’t any variables at the lowest level of nesting, so I could summarize the outcome for each of the 400 variables using the mean and standard error of the mean, and then use those as my outcome in a model that also nests the data under M (specifically, each m is a brain region in which multiple voxels are observed). So the above model would become:

bf( y | se(y_se, sigma = TRUE) ~ 1 + x + (1 | id/session) + (1 + x | M) )

With a regularizing prior on the variance of x across M I think I could just look at the posterior for each x_m and use the 95% credible interval to make decisions about the sign.


The context

This is in fMRI research, where we really like to say that some regions of the brain are associated with this or that, but at the moment we don’t find the magnitude of these effects very meaningful. It’s also tough to produce a ROPE, though this might ultimately be the most concrete and defensible way to make these kinds of decisions. Anyway, this is why I want to be able to make a claim about the sign couched in some probability of error, and not just show all of these as posterior distributions.

Thanks very much for helping me think about this!

a large-ish number (~400) of posterior distributions for some regression coefficients

Are those “regression coefficients” from ~400 brain regions?

I’d like to make a claim about the sign of these coefficients while keeping my probability of making a Type-S error at some level (hey, how about 5%).

I might have misunderstood the question. It seems to me that regularization through hierarchical modeling would usually help control type S (and M) error, but in reality there is no effective way to directly assess type S (or M) error unless you know the ground truth. Those posterior distributions are likely the best you could achieve.

1 Like

Yeah, pardon the non-Bayesian terminology. Those 400 parameters are from 400 brain regions.

Yes, my goal is just to control my estimated Type-S probability conditional on the model, and the set of parameters I decide to conclude have a certain sign. Is this a silly goal? Perhaps, though useful in the current context of reporting brain imaging results, I think.

To clarify my question, in addition to

  1. just a quick sniff test on the simple probability calculation I’m making regarding the estimated overall probability of a Type S error among a set of results,
  2. I am also wondering if anyone has advice on whether there is an advantage to doing this in a hierarchical context versus on the independent models (one model for each of the 400 regions).

John, I still think it’s difficult to estimate type S error without knowing the “true” effect. You may take a look at this nice blog about type S/M errors.