Using predictor distribution instead of standard error

I wanted to know if it is possible to use predictors that came from an Bayesian estimation with their specific posterior distributions instead of their standard error.

The idea behind this question: I’m thinking about how to best using all my data I collected for a new measurement instrument - call it X. I used IRT to get item parameters. To model these parameters I had a bigger and more heterogene sample of size A.

Now I want to do a regression on the skill of a subsample B. The subsample are students of the corresponding topic. The sample A additionally holds data from people who are not studying this subject and those who are teaching this subject but only regarding instrument X. For the subsample I have addional data - call it Y and Z. Y can be assumed to be identical distributed over the whole population (say all adults, e. g. fluid intelligence). Z probably is differently distributed over subsets (e. g. chemistry knowledge).

So my first idea was to model the item parameters in the heterogeniouse group and use them in the latent regression of the subset (with se or distribution). The other idea was to use the whole sample and specify missiongs for the additional people not part of subset B. But there I’m worried about specifiying a group specific mean for chemistry knowledge for the people who are not studying it when I only have knowledge about the chemistry students. (For intelligence I could assume its the same mean.)

So to come back to my question stated at the beginning. Is it possible to use the whole distribution instead of the se for item parametrs?

And second: Is it a (more) valid way to use there parameters as predictors in the subsample instead of working with missiongs for the whole sample?

(Or is there a way to specify an estimation for mean chemistry skill level for non chemitsry students? My assumption would be the mean is lower so maybe I could work with something like ordinal strzucture?)

1 Like

Sorry for taking so long to respond.

I am not completely sure what you want to achieve. If you want to use posterior of one model as a prior for another, that’s generally non-trivial (e.g. see Composing Stan models (posterior as next prior))

I think the best way is to figure out how to build a bigger model that fits everything at the same time (i.e. using Y and Z only for B, but fitting A at the same time). One way to do this is in brms is to add a new variable (say isB) that is 1 for students in B and 0 for others. Then you put whatever values you like for Y and Z for A and then use isB + isB:Y + isB:Z in your formula. This way Y and Z only ever affect estimates for B. But obviously details and best approach would depend on the details of your model.

Does that make sense?

Thanks for your reply and the provided link, Martin.
My idea was not to use the posterior as a new prior but instead use the estimated item parameters as fixed predictors while taking in account the uncertainty in form of their posterior instead of their standard error via , me(predictor,sd_predictor) (they might be a little bit asymmetric). Although, this would be kind of inferenc statistican tactic.

I try to summarise my thoughts about your suggestion. Therefore I’ll bring up the concrete specification of a model. Studylevel is 0 for the non students and 7 for the teachers. The remaining students (subset B) have been assigned with numbers 1 to 6. A hypothesis is that the skill level raises with the study level. But I don’t want to make them monotonic by model specification right away.

formula <- bf(
  response ~ beta + exp(logalpha1) * theta,
  nl = TRUE,
  theta ~ 0 + (1 | studylevel) + (1 | studylevel:ID),
  beta ~ 1 + (1 |i| item),
  logalpha1 ~ 1 + (1 |i| item),
  family = brmsfamily("bernoulli", link = "logit")
)

would get become

formula <- bf(
  response ~ beta + exp(logalpha1) * theta + isB:Y + isB:Z,
  nl = TRUE,
  theta ~ 0 + (1 | studylevel) + (1 | studylevel:ID),
  beta ~ 1 + (1 |i| item),
  logalpha1 ~ 1 + (1 |i| item),
  family = brmsfamily("bernoulli", link = "logit")
)

Because I already have the (1 | studylevel) term I would tend to not incorporate the isB term. This would be kind of interfering information. Using isB:Y would result in the same as setting Y to median(Y) = 0 if Y is centered, wouldn’t it? And using isB:Z would result in the same as setting Z to 0 if Z is not centered, wouldn’t it?

If this is the case it seems to me as better going with mi(predictor) for all people not in subset B because in this case the values would carry uncertainty and express my knowledge better as sharp 0 values.

Or do I get the effects of your formula it wrong?

Yes that might make more sense - just reminds me that I need to be careful around my suggestions as I do get stuff wrong (like here). My suggestion could probably be used as a hack/approximation if the missing data model turns out to be computationally intractable.

If you mean referring to fitted coefficients from one formula elsewhere in the model, than this is currently not supported in brms, although there were some discussion to adding this feature.

Does that answer all your questions?