Working with psychological scales data: is there a reason to not use individual items as predictors?

I hope this is not too much of a general question for this forum. The title pretty much says it all. Suppose we have data from multiple psychological scales, each with several items/questions (typically 5-20). Now, the top-level goal is to try use the variation in some theoretical constructs underlying the scales to predict an outcome/response.

The common strategy is to average data from psychological scales over items into one aggregate measure (of well-being, depression, authoritarianism, etc…). However, if we’re using the scales as predictors, is this necessary? For example, if we use a Bayesian “ridge” regression (hierarchical prior on the regression slopes sd), and then use e.g. Aki Vehtari’s predictive projection method for feature selection, wouldn’t it make sense to include individual items as predictors in the reference model, rather than the aggregated scales? Perhaps, if we want to include scale-membership information, with a hierarchical prior on the slope sd across scales? Is there some benefit of averaging across scales that outweighs the loss of information that comes with it?


Even if it is common practice to create this composite score (sum or average), it i not the recommended practice.
This composite score assumes that each time has 0 measurement error. While this is not true, and then this compund measurement error and variance not related to the underlying factor.
For this reason the recommended approach is to use a method of latent variables, such as Structural Equation Modeling (SEM) or Item Response Theory (IRT)
These factor models partition the variance of each item differently, like this
X = C + M + O + S + e
X = measured variabke
C = common variance related to the underlying factor
M = method variance related to the way the item was asked
O = occassion variance related to time/situation at the moment of measurement
S = specific reliable variance of the item related to factors different than the underlying factor of interest
e = measurement error

This way these factor methods are able to have a cleaner measure of the factor, evaluated by the Common variance between items, while also extracting some of the measurement error
The idea is that the factor measures something that cannot be measured with only 1 items, so we use multiple items that are imperfect measures of multiple characterictics of the underlying factor of interest.
Also, with these models, you can test if the group of items actually “hold” together to define the underlying construct, while the composite score assumes that the items can be summed up and be meaningful
Once you have evaluated that the factor is a good approximation, you can use the factor as outcome of predictor. This allows to test relations while controling for measuremnet error and specific variance from each item

The issue of using each item separately are: you would be assuming the item is meaure with no error, and also, you would be assuming that each individual item is theoretically meaningful outcome or predictor. Rarely on this scales would be theoretical interest in the meaning of each individual item. You can do it, but theoretically presents a different question

This is summary of measurement with factor analysis models, hope it helps

1 Like

Thanks, those are some really interesting points! I’ve been meaning to get a bit into Bayesian SEM/IRT for a while, could you perhaps point me to some good resources/tutorials?

Would that be an issue if we model the items with hierarchical shrinkage across scales?

For BSEM, software wise I recommend to use blavaan in R, it uses the same syntax of lavaan, and runs the model in either Stan or JAGS. I teach a summer week course on the topic BSEM summer course, which will be a webinar format this summer due to COVID-19
blavaan has the advantage of being pretty flexible for a wide variety of models, while still makes it easier to work on

For IRT, I have built my model syntax, for easiness you can look at the edstan package edstan which can run several IRT models in Stan. Might be limited if you want to run a model that is not available there

Even with shrinkage you will be looking at item effects, instead of a major factor. And each item has multiple sources of variance that can affect the relation (X = C + M + O + S + e). I dont think shrinkage would correct for measurement error either

These are some references
Garnier-Villarreal, M., & Jorgensen, T. D. (2019). Adapting Fit Indices for Bayesian Structural Equation Modeling: Comparison to Maximum Likelihood. Psychological Methods .

Luo, Y., & Jiao, H. (2018). Using the Stan Program for Bayesian Item Response Theory. Educational and Psychological Measurement , 78 (3), 384–408.

Merkle, E. C., & Rosseel, Y. (2018). \textttblavaan: Bayesian structural equation models via parameter expansion. Journal of Statistical Software , 85 (4).

Merkle, E. C., & Wang, T. (2018). Bayesian latent variable models for the analysis of experimental psychology data. Psychonomic Bulletin & Review , 25 (1), 256–270.

Muthén, B., & Asparouhov, T. (2012). Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychological Methods , 17 (3), 313–335.

van de Schoot, R., Kluytmans, A., Tummers, L., Lugtig, P., Hox, J., & Muthén, B. (2013). Facing off with Scylla and Charybdis: A comparison of scalar, partial, and the novel possibility of approximate measurement invariance. Frontiers in Psychology , 4 .

Kaplan, D. (2014). Bayesian Statistics for the Social Sciences . New York:\ The Guilford Press.

Lee, S. Y. (2007). Structural Equation Modeling: A Bayesian Approach . Wiley.

Reckase, M. D. (2009). Multidimensional Item Response Theory (1st ed.). Springer Publishing Company, Incorporated.

Developing a rich model of the latent construct and measurement in properties is only important if you’re interested in that side of things though. If you just want to predict new observations that lie within your original observation range, as it sounds like you do, then yes other tools (like ridge regression) are likely far more effective.

@abartonicek if you want to dive into Bayesian SEM by way of brms (uses Stan) here is a nice, short tutorial . You can dump out the stan model from brms and fiddle around with it. I usually do this as a sanity check on models I write in Stan.

I can also send you our code and data for a more complicated SEM.